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Series Introduction 



Over the past 50 years, digital signal processing has evolved as a major 
engineering discipline. The fields of signal processing have grown from the 
origin of fast Fourier transform and digital filter design to statistical spectral 
analysis and array processing, and image, audio, and multimedia processing, 
and shaped developments in high-performance VLSI signal processor 
design. Indeed, there are few fields that enjoy so many applications — signal 
processing is everywhere in our lives. 

When one uses a cellular phone, the voice is compressed, coded, and 
modulated using signal processing techniques. As a cruise missile winds 
along hillsides searching for the target, the signal processor is busy proces- 
sing the images taken along the way. When we are watching a movie in 
FIDTV, millions of audio and video data are being sent to our homes and 
received with unbelievable fidelity. When scientists compare DNA samples, 
fast pattern recognition techniques are being used. On and on, one can see 
the impact of signal processing in almost every engineering and scientific 
discipline. 

Because of the immense importance of signal processing and the fast- 
growing demands of business and industry, this series on signal processing 
serves to report up-to-date developments and advances in the field. The 
topics of interest include but are not limited to the following: 

Signal theory and analysis 
Statistical signal processing 
Speech and audio processing 
Image and video processing 
Multimedia signal processing and technology 
Signal processing for communications 
Signal processing architectures and VLSI design 
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I hope this series will provide the interested audience with high-quality, 
state-of-the-art signal processing literature through research monographs, 
edited books, and rigorously written textbooks by experts in their fields. 

K. J. Ray Liu 
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Preface 



The main idea behind this book, and the incentive for writing it, is that 
strong connections exist between adaptive filtering and signal analysis, to 
the extent that it is not realistic — at least from an engineering point of 
view — to separate them. In order to understand adaptive filters well enough 
to design them properly and apply them successfully, a certain amount of 
knowledge of the analysis of the signals involved is indispensable. 
Conversely, several major analysis techniques become really efficient and 
useful in products only when they are designed and implemented in an 
adaptive fashion. This book is dedicated to the intricate relationships 
between these two areas. Moreover, this approach can lead to new ideas 
and new techniques in either field. 

The areas of adaptive filters and signal analysis use concepts from several 
different theories, among which are estimation, information, and circuit 
theories, in connection with sophisticated mathematical tools. As a conse- 
quence, they present a problem to the application-oriented reader. However, 
if these concepts and tools are introduced with adequate justification and 
illustration, and if their physical and practical meaning is emphasized, they 
become easier to understand, retain, and exploit. The work has therefore 
been made as complete and self-contained as possible, presuming a back- 
ground in discrete time signal processing and stochastic processes. 

The book is organized to provide a smooth evolution from a basic knowl- 
edge of signal representations and properties to simple gradient algorithms, 
to more elaborate adaptive techniques, to spectral analysis methods, and 
finally to implementation aspects and applications. The characteristics of 
determinist, random, and natural signals are given in Chapter 2, and funda- 
mental results for analysis are derived. Chapter 3 concentrates on the cor- 
relation matrix and spectrum and their relationships; it is intended to 
familiarize the reader with concepts and properties that have to be fully 
understood for an in-depth knowledge of necessary adaptive techniques in 
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engineering. The gradient or least mean squares (LMS) adaptive filters are 
treated in Chapter 4. The theoretical aspects, engineering design options, 
finite word-length effects, and implementation structures are covered in 
turn. Chapter 5 is entirely devoted to linear prediction theory and techni- 
ques, which are crucial in deriving and understanding fast algorithms opera- 
tions. Fast least squares (FLS) algorithms of the transversal type are derived 
and studied in Chapter 6, with emphasis on design aspects and performance. 
Several complementary algorithms of the same family are presented in 
Chapter 7 to cope with various practical situations and signal types. 

Time and order recursions that lead to FLS lattice algorithms are pre- 
sented in Chapter 8, which ends with an introduction to the unified geo- 
metric approach for deriving all sorts of FLS algorithms. In other areas of 
signal processing, such as multirate filtering, it is known that rotations 
provide efficiency and robustness. The same applies to adaptive filtering, 
and rotation based algorithms are presented in Chapter 9. The relationships 
with the normalized lattice algorithms are pointed out. The major spectral 
analysis and estimation techniques are described in Chapter 10, and the 
connections with adaptive methods are emphasized. Chapter 1 1 discusses 
circuits and architecture issues, and some illustrative applications, taken 
from different technical fields, are briefly presented, to show the significance 
and versatility of adaptive techniques. Finally, Chapter 12 is devoted to the 
field of communications, which is a major application area. 

At the end of several chapters, FORTRAN listings of computer subrou- 
tines are given to help the reader start practicing and evaluating the major 
techniques. 

The book has been written with engineering in mind, so it should be most 
useful to practicing engineers and professional readers. However, it can also 
be used as a textbook and is suitable for use in a graduate course. It is worth 
pointing out that researchers should also be interested, as a number of new 
results and ideas have been included that may deserve further work. 

I am indebted to many friends and colleagues from industry and research 
for contributions in various forms and I wish to thank them all for their 
help. For his direct contributions, special thanks are due to J. M. T. 
Romano, Professor at the University of Campinas in Brazil. 

Maurice G. Bellcmger 
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1 

Adaptive Filtering and Signal 
Analysis 



Digital techniques are characterized by flexibility and accuracy, two proper- 
ties which are best exploited in the rapidly growing technical field of adap- 
tive signal processing. 

Among the processing operations, linear filtering is probably the most 
common and important. It is made adaptive if its parameters, the coeffi- 
cients, are varied according to a specified criterion as new information 
becomes available. That updating has to follow the evolution of the system 
environment as fast and accurately as possible, and, in general, it is asso- 
ciated with real-time operation. Applications can be found in any technical 
field as soon as data series and particularly time series are available; they are 
remarkably well developed in communications and control. 

Adaptive filtering techniques have been successfully used for many years. 
As users gain more experience from applications and as signal processing 
theory matures, these techniques become more and more refined and sophis- 
ticated. But to make the best use of the improved potential of these techni- 
ques, users must reach an in-depth understanding of how they really work, 
rather than simply applying algorithms. Moreover, the number of algo- 
rithms suitable for adaptive filtering has grown enormously. It is not unu- 
sual to find more than a dozen algorithms to complete a given task. Finding 
the best algorithm is a crucial engineering problem. The key to properly 
using adaptive techniques is an intimate knowledge of signal makeup. That 
is why signal analysis is so tightly connected to adaptive processing. In 
reality, the class of the most performant algorithms rests on a real-time 
analysis of the signals to be processed. 



y 
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Conversely, adaptive techniques can be efficient instruments for perform- 
ing signal analysis. For example, an adaptive filter can be designed as an 
intelligent spectrum analyzer. 

So, for all these reasons, it appears that learning adaptive filtering goes 
with learning signal analysis, and both topics are jointly treated in this book. 

First, the signal analysis problem is stated in very general terms. 



By definition a signal carries information from a source to a receiver. In the 
real world, several signals, wanted or not, are transmitted and processed 
together, and the signal analysis problem may be stated as follows. 

Let us consider a set of N sources which produce N variables 
Xq, x\, . . . , x N _i and a set of N corresponding receivers which give N vari- 
ables yo,yi, . . . , >’,v_i , as shown in Figure 1.1. The transmission medium is 
assumed to be linear, and every receiver variable is a linear combination of 
the source variables: 



The parameters are the transmission coefficients of the medium. 



1.1. SIGNAL ANALYSIS 





(1.1) 





"Vh n-i 



SOURCES 
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FIG. 1.1 A transmission system of order N. 
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Now the problem is how to retrieve the source variables, assumed to 
carry the useful information looked for, from the receiver variables. It 
might also be necessary to find the transmission coefficients. Stated as 
such, the problem might look overly ambitious. It can be solved, at least 
in part, with some additional assumptions. 

For clarity, conciseness, and thus simplicity, let us write equation (1.1) in 
matrix form: 

Y — MX (1.2) 

with 





x o 




To 




X\ 


, 7 = 


yi 




_ X N - 1 _ 




_Yn - i _ 



M — 



w 00 w 0 i 

«? m m n 



m o N _i 
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m N - 10 
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Now assume that the x t are random centered uncorrelated variables and 
consider the N x N matrix 



YY’^MXX'M 1 (1.3) 

where M 1 denotes the transpose of the matrix M. Taking its mathematical 
expectation and noting that the transmission coefficients are deterministic 
variables, we get 

E[ YY r ] = ME[XX‘]M' (1.4) 

Since the variables x,-(0 < i ^ N — 1) are assumed to be uncorrelated, the 
N x N source matrix is diagonal: 



E[XX‘] = 


r^o 

0 


0 ■■ 


0 ' 
0 


= diag [P Xo ,P Xi ,...,P XN J 


where 


_ 0 


0 ■■ 


. p 

X N -\ J 




P, - P[xi] 
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is the power of the source with index i. Thus, a decomposition of the receiver 
covariance matrix has been achieved: 

E[ YY>] = M diag [P Xo , P Xl , . . . , P XN , ]M' (1.5) 

Finally, it appears possible to get the source powers and the transmission 
matrix from the diagonalization of the covariance matrix of the receiver 
variables. In practice, the mathematical expectation can be reached, under 
suitable assumptions, by repeated measurements, for example. It is worth 
noticing that if the transmission medium has no losses, the power of the 
sources is transferred to the receiver variables in totality, which corresponds 
to the relation MM' — I N ; the transmission matrix is unitary in that case. 

In practice, useful signals are always corrupted by unwanted externally 
generated signals, which are classified as noise. So, besides useful signal 
sources, noise sources have to be included in any real transmission system. 
Consequently, the number of sources can always be adjusted to equal the 
number of receivers. Indeed, for the analysis to be meaningful, the number 
of receivers must exceed the number of useful sources. 

The technique presented above is used in various fields for source detec- 
tion and location (for example, radio communications or acoustics); the set 
of receivers is an array of antennas. However, the same approach can be 
applied as well to analyze a signal sequence when the data y(n) are linear 
combinations of a set of basic components. The problem is then to retrieve 
these components. It is particularly simple when y(n) is periodic with period 
N, because then the signal is just a sum of sinusoids with frequencies that are 
multiples of \/N, and the matrix M in decomposition (1.5) is the discrete 
Fourier transform (DFT) matrix, the diagonal terms being the power spec- 
trum. For an arbitrary set of data, the decomposition corresponds to the 
representation of the signal as sinusoids with arbitrary frequencies in noise; 
it is a harmonic retrieval operation or a principal component analysis pro- 
cedure. 

Rather than directly searching for the principal components of a signal to 
analyze it, extract its information, condense it, or clear it from spurious 
noise, we can approximate it by the output of a model, which is made as 
simple as possible and whose parameters are attributed to the signal. But to 
apply that approach, we need some characterization of the signal. 



1.2. CHARACTERIZATION AND MODELING 

A straightforward way to characterize a signal is by waveform parameters. 
A concise representation is obtained when the data are simple functions of 
the index n. For example, a sinusoid is expressed by 
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x(n) — S sin(«&> + q > ) 



( 1 - 6 ) 



where S is the sinusoid amplitude, a> is the angular frequency, and <p is the 
phase. The same signal can also be represented and generated by the recur- 
rence relation 

x(ii) = (2 cos co)x(n — 1) — x(n — 2) (1.7) 

for n ^ 0, and the initial conditions 
x { — 1 ) = 5 sin (— at + cp) 
x(—2) — S sin(— 2<w + <p) 
x(n) = 0 for n < — 2 

Recurrence relations play a key role in signal modeling as well as in adaptive 
filtering. The correspondence between time domain sequences and recur- 
rence relations is established by the z-transform, defined by 

OO 

X(z) = x(n)z- n (1.8) 



Waveform parameters are appropriate for synthetic signals, but for prac- 
tical signal analysis the correlation function r(p), in general, contains the 
relevant characteristics, as pointed out in the previous section: 

rip) = E[x{n)x{n - p)\ (1.9) 

In the analysis process, the correlation function is first estimated and then 
used to derive the signal parameters of interest, the spectrum, or the recur- 
rence coefficients. 

The recurrence relation is a convenient representation or modeling of a 
wide class of signals, which are those obtained through linear digital filtering 
of a random sequence. For example, the expression 

N 

x(n) — e(n) — ^ a,x(« — i) (1.1 0) 

i= 1 

where e(«) is a random sequence or noise input, defines a model called 
autoregressive (AR). The corresponding filter is of the infinite impulse 
response (HR) type. If the filter is of the finite impulse response (FIR) 
type, the model is called moving average (MA), and a general filter FIR/ 
HR is associated to an ARMA model. 
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The coefficients a t in (1.10) are the FIR, or transversal, linear prediction 
coefficients of the signal x(n); they are actually the coefficients of the inverse 
FIR filter defined by 

N 

e(n) — ^ ajX(n — i), a 0 — 1 (1.11) 

i= 0 

The sequence e(n) is called the prediction error signal. The coefficients are 
designed to minimize the prediction error power, which, expressed as a 
matrix form equation is 

E[e 2 (n)\ = A'E[XX']A (1.12) 

So, for a given signal whose correlation function is known or can be 
estimated, the linear prediction (or AR modeling) problem can be stated as 
follows: find the coefficient vector A which minimizes the quantity 
A f E[XX'\A subject to the constraint a 0 — 1. In that process, the power 
of a white noise added to the useful input signal is magnified by the factor 
A’ A. 

To provide a link between the direct analysis of the previous section and 
AR modeling, and to point out their major differences and similarities, we 
note that the harmonic retrieval, or principal component analysis, corre- 
sponds to the following problem: find the vector A which minimizes the 
value A r E[XX'\A subject to the constraint A' A = 1. The frequencies of the 
sinusoids in the signal are then derived from the zeros of the filter with 
coefficient vector A. For deterministic signals without noise, direct analysis 
and AR modeling lead to the same solution; they stay close to each other for 
high signal-to-noise ratios. 

The linear prediction filter plays a key role in adaptive filtering because it 
is directly involved in the derivation and implementation of least squares 
(LS) algorithms, which in fact are based on real-time signal analysis by AR 
modeling. 



1.3. ADAPTIVE FILTERING 

The principle of an adaptive filter is shown in Figure 1.2. The output of a 
programmable, variable-coefficient digital filter is subtracted from a refer- 
ence signal y(n) to produce an error sequence e(ri), which is used in com- 
bination with elements of the input sequence x(ri), to update the filter 
coefficients, following a criterion which is to be minimized. The adaptive 
filters can be classified according to the options taken in the following 
areas: 
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FIG. 1.2 Principle of an adaptive filter. 



The optimization criterion 

The algorithm for coefficient updating 

The programmable filter structure 

The type of signals processed — mono- or multidimensional. 

The optimization criterion is in general taken in the LS family in order to 
work with linear operations. However, in some cases, where simplicity of 
implementation and robustness are of major concern, the least absolute 
value (LAV) criterion can also be attractive; moreover, it is not restricted 
to minimum phase optimization. 

The algorithms are highly dependent on the optimization criterion, and it 
is often the algorithm that governs the choice of the optimization criterion, 
rather than the other way round. In broad terms, the least mean squares 
(LMS) criterion is associated with the gradient algorithm, the LAV criterion 
corresponds to a sign algorithm, and the exact LS criterion is associated 
with a family of recursive algorithms, the most efficient of which are the fast 
least squares (FLS) algorithms. 

The programmable filter can be a FIR or IIR type, and, in principle, 
it can have any structure: direct form, cascade form, lattice, ladder, or 
wave filter. Finite word-length effects and computational complexity vary 
with the structure, as with fixed coefficient filters. But the peculiar point 
with adaptive filters is that the structure reacts on the algorithm com- 
plexity. It turns out that the direct-form FIR, or transversal, structure is 
the simplest to study and implement, and therefore it is the most 
popular. 

Multidimensional signals can use the same algorithms and structures as 
their monodimensional counterparts. However, computational complexity 
constraints and hardware limitations generally reduce the options to the 
simplest approaches. 



Copyright n 2001 by Marcel Dekker, Inc. All Rights Reserved. 





The study of adaptive filtering begins with the derivation of the normal 
equations, which correspond to the LS criterion combined with the FIR 
direct form for the programmable filter. 



1.4. NORMAL EQUATIONS 



In the following, we assume that real-time series, resulting, for example, 
from the sampling with period T — 1 of a continuous-time real signal, are 
processed. 

Let H(n) be the vector of the N coefficients A,(«) of the programmable 
filter at time n, and let X(n) be the vector of the N most recent input signal 
samples: 





h 0 (n) 




x(n) 


11(11) 


/?)i(«) 


, X(n) = 


x(n — 1 ) 




_h N _i(n)_ 




x(pi + 1 — N) 



(1.13) 



The error signal e(n) is 
e(n ) = y(n) — H'(n)X(n) 



(1.14) 



The optimization procedure consists of minimizing, at each time index, a 
cost function J(n), which, for the sake of generality, is taken as a weighted 
sum of squared error signal values, beginning after time zero: 

n 

m = E wn ~ P ly(P) ~ H\n)X(p)\ 2 (1.15) 

p = 1 



The weighting factor, W, is generally taken close to 1(0 -C W ^ 1). 

Now, the problem is to find the coefficient vector //(/;) which minimizes 
J(n). The solution is obtained by setting to zero the derivatives of J{n) with 
respect to the entries /?,(«) of the coefficient vector H(ri), which leads to 

J 2 wn ~ P \y(P) - V'(n)X(pm P ) = 0 ( 1 . 16 ) 

p=\ 

In concise form, (1.16) is 

H(n) = Rp(n)r yx (n) (1.17) 

with 



R N (n) = W n - p X(p)X t (p) 
p= i 



(1.18) 
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(1.19) 



r yx {n) = Y, W n - p X(p)y(p) 
p= i 

If the signals are stationary, let R xx be the N x N input signal autocorrela- 
tion matrix and let r be the vector of cross-correlations between input and 
reference signals: 

Rxx = E[X(p)X'(p)\, r yx = E[X(p)y(p)] (1.20) 

Now 

1 — W" 1 - W" 

R[R N (n)} = 73-^7 Rxx, E[r yx (n)\ = j—^r yx (1.21) 

So R N (f 7) is an estimate of the input signal autocorrelation matrix, and r vx (n) 
is an estimate of the cross-correlation between input and reference signals. 
The optimal coefficient vector H opl is reached when n goes to infinity: 

//opt = Rxx r yx (1.22) 

Equations (1.22) and (1.17) are the normal (or Yule-Walker) equations for 
stationary and evolutive signals, respectively. In adaptive filters, they can be 
implemented recursively. 



1.5. RECURSIVE ALGORITHMS 

The basic goal of recursive algorithms is to derive the coefficient vector 
H(n + 1) from H(n). Both coefficient vectors satisfy (1.17). In these equa- 
tions, autocorrelation matrices and cross-correlation vectors satisfy the 
recursive relations 

R N (n + 1) = WR N (n) + X(n + 1 )X‘(n + 1 ) (1 .23) 

r yx (n + 1) = Wr yx (n) + X(n + 1 )y(n + 1) (1.24) 

Now, 

H(n + 1) = R^ X {n + 1 )[ Wr yx (n) + X(n + 1 )y(n + 1)] 

But 

Wr yx (n) = [R N (n + 1) - X(n + 1 )X\n + 1 )\H(n) 
and 

H(n + 1) = H{n) + R^'in + 1 )X(n + 1 )\y(n + 1) - X\n + 1 )H(n)\ 



> 
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(1.25) 




which is the recursive relation for the coefficient updating. In that expres- 
sion, the sequence 

e(n + 1) = y(n + 1) - X\n + \)H(n) (1.26) 

is called the a priori error signal because it is computed by using the coeffi- 
cient vector of the previous time index. In contrast, (1.14) defines the a 
posteriori error signal s(n), which leads to an alternative type of recurrence 
equation 

H(n + 1) = H(n) + W~ x RT N \n)X(n + 1 )e(n + 1) (1.27) 

For large values of the filter order N, the matrix manipulations in (1.25) 
or (1.27) lead to an often unacceptable hardware complexity. We obtain a 
drastic simplification by setting 

R \ (n + 1) Sim 

where I N is the (N x N) unity matrix and 5 is a positive constant called the 
adaptation step size. The coefficients are then updated by 

H(n + 1) = H(n) + SX(n + 1 )e(n + 1 ) (1 .28) 

which leads to just doubling the computations with respect to the fixed- 
coefficient filter. The optimization process no longer follows the exact LS 
criterion, but LMS criterion. The product X(n+ 1 )e(n+ 1) is proportional 
to the gradient of the square of the error signal with opposite sign, because 
differentiating equation (1.26) leads to 

- = 2 x(n + 1 - i)e(n +1), 0 < i < N - 1 (1.29) 

ah,{n) 

hence the name gradient algorithm. 

The value of the step size S has to be chosen small enough to ensure 
convergence; it controls the algorithm speed of adaptation and the residual 
error power after convergence. It is a trade-off based on the system engi- 
neering specifications. 

The gradient algorithm is useful and efficient in many applications; it is 
flexible, can be adjusted to all filter structures, and is robust against imple- 
mentation imperfections. However, it has some limitations in performance 
and weaknesses which might not be tolerated in various applications. For 
example, its initial convergence is slow, its performance depends on the 
input signal statistics, and its residual error power may be large. If one is 
prepared to accept an increase in computational complexity by a factor 
usually smaller than an order of magnitude (typically 4 or 5), then the 
exact recursive LS algorithm can be implemented. The matrix manipulations 
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can be avoided in the coefficient updating recursion by introducing the 
vector 

G(n) = RJf(n)X(ri) (1.30) 

called the adaptation gain, which can be updated with the help of linear 
prediction filters. The corresponding algorithms are called FLS 

algorithms. 

Up to now, time recursions have been considered, based on the cost 
function J(n) defined by equation (1.15) for a set of N coefficients. It is 
also possible to work out order recursions which lead to the derivation of 
the coefficients of a filter of order N + 1 from the set of coefficients of a 

filter of order N. These order recursions rely on the introduction of a 

different set of filter parameters, called the partial correlation 

(PARCOR) coefficients, which correspond to the lattice structure for the 
programmable filter. Now, time and order recursions can be combined in 
various ways to produce a family of LS lattice adaptive filters. That 
approach has attractive advantages from the theoretical point of view — 
for example, signal orthogonalization, spectral whitening, and easy control 
of the minimum phase property — and also from the implementation point 
of view, because it is robust to word-length limitations and leads to flexible 
and modular realizations. 

The recursive techniques can easily be extended to complex and multi- 
dimensional signals. Overall, the adaptive filtering techniques provide a wide 
range of means for fast and accurate processing and analysis of signals. 



1.6. IMPLEMENTATION AND APPLICATIONS 

The circuitry designed for general digital signal processing can also be used 
for adaptive filtering and signal analysis implementation. However, a few 
specificities are worth point out. First, several arithmetic operations, such as 
divisions and square roots, become more frequent. Second, the processing 
speed, expressed in millions of instructions per second (MIPS) or in millions 
of arithmetic operations per second (MOPS), depending on whether the 
emphasis is on programming or number crunching, is often higher than 
average in the field of signal processing. Therefore specific efficient archi- 
tectures for real-time operation can be worth developing. They can be spe- 
cial multibus arrangements to facilitate pipelining in an integrated processor 
or powerful, modular, locally interconnected systolic arrays. 

Most applications of adaptive techniques fall into one of two broad 
classes: system identification and system correction. 
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FIG. 1.3 Adaptive filter for system identification. 



The block diagram of the configuration for system identification is shown 
in Figure 1.3. The input signal x(n) is fed to the system under analysis, which 
produces the reference signal y(n). The adaptive filter parameters and spe- 
cifications have to be chosen to lead to a sufficiently good model for the 
system under analysis. That kind of application occurs frequently in auto- 
matic control. 

System correction is shown in Figure 1.4. The system output is the adap- 
tive filter input. An external reference signal is needed. If the reference signal 
y{ri) is also the system input signal u(ri), then the adaptive filter is an inverse 
filter; a typical example of such a situation can be found in communications, 
with channel equalization for data transmission. In both application classes, 
the signals involved can be real or complex valued, mono- or multidimen- 
sional. Although the important case of linear prediction for signal analysis 
can fit into either of the aforementioned categories, it is often considered as 
an inverse filtering problem, with the following choice of signals: 
y(n) — 0, u(n) — e(n) . 




FIG. 1.4 Adaptive filter for system correction. 
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Another field of applications corresponds to the restoration of signals 
which have been degraded by addition of noise and convolution by a known 
or estimated filter. Adaptive procedures can achieve restoration by decon- 
volution. 

The processing parameters vary with the class of application as well as 
with the technical fields. The computational complexity and the cost effi- 
ciency often have a major impact on final decisions, and they can lead to 
different options in control, communications, radar, underwater acoustics, 
biomedical systems, broadcasting, or the different areas of applied physics. 



1.7. FURTHER READING 

The basic results, which are most necessary to read this book, in signal 
processing, mathematics, and statistics are recalled in the text as close as 
possible to the place where they are used for the first time, so the book is, to 
a large extent, self-sufficient. However, the background assumed is a work- 
ing knowledge of discrete-time signals and systems and, more specifically, 
random processes, discrete Fourier transform (DFT), and digital filter prin- 
ciples and structures. Some of these topics are treated in [1], Textbooks 
which provide thorough treatment of the above-mentioned topics are [2- 
4]. A theoretical veiw of signal analysis is given in [5], and spectral estima- 
tion techniques are described in [6]. Books on adaptive algorithms include 
[7-9]. Various applications of adaptive digital filters in the field of commu- 
nications are presented in [10-1 1]. 
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2 

Signals and Noise 



Signals carry information from sources to receivers, and they take many 
different forms. In this chapter a classification is presented for the signals 
most commonly used in many technical fields. 

A first distinction is between useful, or wanted, signals and spurious, or 
unwanted, signals, which are often called noise. In practice, noise sources 
are always present, so any actual signal contains noise, and a significant part 
of the processing operations is intended to remove it. However, useful sig- 
nals and noise have many features in common and can, to some extent, 
follow the same classification. 

Only data sequences or time series are considered here, and the leading 
thread for the classification proposed is the set of recurrence relations, which 
can be established between consecutive data and which are the basis of 
several major analysis methods [1—3]. In the various categories, signals 
can be characterized by waveform functions, autocorrelation, and spectrum. 

An elementary, but fundamental, signal is introduced first — the damped 
sinusoid. 



2.1. THE DAMPED SINUSOID 

Let us consider the following complex sequence, which is called the damped 
complex sinusoid, or damped cisoid: 



y(n) = 



e (a+ja> 0 )n 

0 , 



n ^ 0 
n < 0 



( 2 . 1 ) 



> 
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where a and &> 0 are real scalars. 

The z-transform of that sequence is, by definition 

OO 

Y(Z) = Y,yW) z ~ n ( 2 - 2 ) 

n= o 



Hence 



Y(z) = 



1 

1 _ e (a+jo>o) z -l 



(2.3) 



The two real corresponding sequences are shown in Figure 2.1(a). They are 
yin) = y R (n ) +jy,(n) (2.4) 

with 



y R (n) — e an cos na> 0 , yi(ri) — e an sinn&> 0 , n ^ 0 
The z-transforms are 



(2.5) 



Yr(z) = 



1 — (<?“ COS (W 0 )z 1 
1 — (2e“cosa> 0 )z _1 + e 2a z~ 2 



( 2 . 6 ) 



TO = 



1 — (e a sin cd 0 )z 1 
1 — (2e a cosft) 0 )z _1 + e 2a z~ 2 



(2-7) 



In the complex plane, these functions have a pair of conjugate poles, 
which are shown in Figure 2.1(b) for a < 0 and |a| small. From (2.6) and 
(2.7) and also by direct inspection, it appears that the corresponding signals 
satisfy the recursion 

y R (n) - 2e a cos co 0 y R (n - 1 ) + 3 2a y R (n - 2) = 0 (2.8) 

with initial values 

y R {- 1 ) = e~ a cos(-« 0 ), v«(-2) = e -2 " cos(-2« 0 ) (2.9) 

and 

yj(-l) = e ~ a sin(-w 0 ), yi (-2) = e 2a sin(-2o; 0 ) (2.10) 



More generally, the one-sided z-transform, as dehned by (2.2), of equa- 
tion (2.8) is 



Yr(z) = - 



b\V /;(—!) + bih'Rj— 2) + V/;( — 1 )- '] 
1 -f b\z ^ -f- b2Z 2 



with b\ — — 2e“cos&> and b 2 — e la . 



( 2 . 11 ) 
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FIG. 2.1 (a) Waveform of a damped sinusoid, (b) Poles of the z-transform of the 

damped sinusoid. 
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The above-mentioned initial values are then obtained by identifying 
(2.11) and (2.6), and (2.11) and (2.7), respectively. 

The energy spectra of the sequences j^(«) and y)j(n) are obtained from 
the z-transforms by replacing z by e J< ° [4]. For example, the function 1 7/(o>) | 
is shown in Figure 2.2; it is the frequency response of a purely recursive 
second-order biter section. 

As n grows to inhnity the signal y(n) vanishes; it is nonstationary. 
Damped sinusoids can be used in signal analysis to approximate the spec- 
trum of a bnite data sequence. 



2.2. PERIODIC SIGNALS 

Periodic signals form an important category, and the simplest of them is the 
single sinusoid, debned by 

x(n) — S sin (nco 0 + <p) (2.12) 

where S is the amplitude, co 0 is the radial frequency, and <p is the phase. 

For n ^ 0, the results of the previous section can be applied with a — 0. 
So the recursion 
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x(n) — 2 cos a> 0 x(n — 1) + x(n — 2) = 0 



(2.13) 



with initial conditions 



x( — 1) = S sin ( — coq -{- (p), x(—2) — Ssm(—2co 0 + (p) (2-14) 

is satisfied. The z-transform is 



X(z) = 5 



sin (p — sin(— a> 0 + <p)z 1 
1 — (2 cos &>o)z _ 1 + z~ 2 



(2.15) 



Now the poles are exactly on the unit circle, and we must consider the 
power spectrum. It cannot be directly derived from the z-transform. The 
sinusoid is generated for n > 0 by the purely recursive second-order filter 
section in Figure 2.3 with the above-mentioned initial conditions, the circuit 
input being zero. For a filter to cancel a sinusoid, it is necessary and suffi- 
cient to implement the inverse filter — that is, a filter which has a pair of zeros 
on the unit circle at the frequency of the sinusoid; such filters appear in 
linear prediction. 

The autocorrelation function (ACF) of the sinusoid, which is a real sig- 
nal, is defined by 



1 N - 1 

>ip) = Jim — V x(n)x(n - p) 
N—*oo JS 

n=0 

Hence, 



(2.16) 




x(n) 



FIG. 2.3 Second-order filter section to generate a sinusoid. 
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i s 2 ^! 

N~2 



rip ) = —cospa) 0 



lim , T „ 

N^-oo N 2 



cos 2 



2n — p 



«o + <P 



(2.17) 



and for any ® 0 , 

S 2 

rip) = —cospw 0 



(2.18) 



The power spectrum of the signal is the Fourier transform of the ACF; 
for the sinusoid it is a line with magnitude S 2 / 2 at frequency co 0 . 

Now, let us proceed to periodic signals. A periodic signal with period N 
consists of a sum of complex sinusoids, or cisoids, whose frequencies are 
integer multiples of 1 /N and whose complex amplitudes S k are given by the 
discrete Fourier transform (DFT) of the signal data: 



So " 




" 1 1 


1 




x(0) 


s. 


1 


1 W 


. . W »-1 




x(l) 


Sjv- i _ 


~N 


_ 1 W N ~ l ■ 






_x(N- 1) 



with W = e ~ K2n/N \ 

Following equation (2.3), with a — 0, we express the z-transform of the 
periodic signal by 



JV— 1 



*(*) = £ 

k = 0 



S k 

1 _ e j(2n/N)k z -\ 



( 2 . 20 ) 



and its poles are uniformly distributed on the unit circle as shown in Figure 
2.4 for N even. Therefore, the signal x(n) satisfies the recursion 

N 

^aAn - /) = 0 ( 2 . 21 ) 

i= 0 

where the a, are the coefficients of the polynomial P(z): 

P(z ) = J2 = HO - e K2n/N)k z~ x ) (2.22) 

;=o k= l 



So a 0 — 1, and if all the cisoids are present in the periodic signal, then a N — 
1 and cij — 0 for 1 < / < N — 1 . The N complex amplitudes, or the real 
amplitudes and phases, are defined by the N initial conditions. If some of 
the N possible cisoids are missing, then the coefficients take on values 
according to the factors in the product (2.22). 

The ACF of the periodic signal x(n) is calculated from the following 
expression, valid for complex data: 
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-J 



FIG. 2.4 Poles of a signal with period N. 



, N - 1 

r(p) = — Y] x(n)x(n - p) (2.23) 

n = 0 

where x(n) is the complex conjugate of x{n). According to the inverse DFT, 
x(ri) can be expressed from its frequency components by 

N - 1 

x(n) = S k e J(2n/N)kn (2.24) 

k= 0 

Now, combining (2.24) and (2.23) gives 

r(p) = E \S k \2 e j(2n/N)kp (2.25) 

A-0 

and, for x(n) a real signal and for the configuration of poles shown in Figure 
2.4 with N even, 

N/2—l p \ 

r(p) — ,S’o + S 2 N /2 + 2 ^ ,S\| 2 cos^— /t/)J (2.26) 

The corresponding spectrum is made of lines at frequencies which are 
integer multiples of 1 /N. 

The same analysis as above can be carried out for a signal composed of a 
sum of sinusoids with arbitrary frequencies, which just implies that the 



Copyright n 2001 by Marcel Dekker, Inc. All Rights Reserved. 




period N may grow to infinity. In that case, the roots of the polynomial 
P(z) take on arbitrary positions on the unit circle. Such a signal is said to be 
deterministic because it is completely determined by the recurrence relation- 
ship (2.21) and the set of initial conditions; in other words, a signal value at 
time n can be exactly calculated from the N preceding values; there is no 
innovation in the process; hence, it is also said to be predictable. 

The importance of P{z) is worth emphasizing, because it directly deter- 
mines the signal recurrence relation. Several methods of analysis primarily 
aim at finding out that polynomial for a start. 

The above deterministic or predictable signals have discrete power spec- 
tra. To obtain continuous spectra, one must introduce random signals. They 
bring innovation in the processes. 



2.3. RANDOM SIGNALS 



A random real signal x(n) is defined by a probability law for its amplitude at 
each time n. The law can be expressed as a probability density p(x, n ) defined 
by 



p(x, ri) 



Prob[.r ^ x(n) < x + A.v] 
a.y^o Ax 



(2.27) 



It is used to calculate, by ensemble averages, the statistics of the signal or 
process [5]. 

The signal is second order if it possesses a first-order moment m.\(n) called 
the mean value or expectation of x(n), denoted E[x(n)\ and defined by 

/ OO 

xp(x, n) dx (2.28) 

-OO 



and a second-order moment, called the covariance: 



E[x(ni)x{n 2 )\ = m 2 (n ] , n 2 ) = 




X\ x 2 p(x ] , x 2 \ n i , n 2 ) dx ] dx 2 



(2.29) 



where p(x l? x 2 ; , «i, n 2 ) is the joint probability density of the pair of random 
variables [x(«,). x(n 2 )\. 

The signal is stationary if its statistical properties are independent of the 
time index n — that is, if the probability density is independent of time n: 

p(x, n) — p(x) (2.30) 



The stationarity can be limited to the moments of first and second order. 
Then the signal is wide-sense stationary, and it is characterized by the fol- 
lowing equations: 
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/»oo 

E\x(n)\ — / xp(x) dx = m\ 

J — OO 


(2.31) 


E[x(n)x(n - p)] = rip) 


(2.32) 



The function r(p) is the (ACF) of the signal. 

The statistical parameters are, in general, difficult to estimate or measure 
directly, because of the ensemble averages involved. A reasonably accurate 
measurement of an ensemble average requires that many process realiza- 
tions be available or that the experiment be repeated many times, which is 
often impractical. On the contrary, time averages are much easier to come 
by, for time series. Therefore the ergodicity property is of great practical 
importance; it states that, for a stationary signal, ensemble and time 
averages are equivalent: 

m x = E[x(n)\ = lim 1 x(n) (2.33) 

N-t-oo ZJS + 1 L — 

n=—N 

l N 

r{p) = E[x(n)x(n - p)\ = lim V x(n)x(n - p) (2.34a) 

TV— >oo ZJS + 1 L — ' 
n=—N 

For complex signals, the ACF is 

1 N 

rip) = E[x(n)x(n - p)] = lim V x(n)x(n - p) (2.34b) 

N—>oo ZJS + 1 

The factor x{n — p) is replaced by its complex conjugate x(n — p)\ note that 
r( 0) is the signal power and is always a real number. 

In the literature, the factor x(n + p) is generally taken to define f(y>); 
however, we use x(n — p) throughout this book because it comes naturally 
in adaptive filtering. 

In some circumstances, moments of order k >2 might be needed. They 
are defined by 




and they can be calculated efficiently through the introduction of a function 
F(u ), called the characteristic function of the random variable x and defined 
by 

/ OO 

e jux p(x) dx (2.36) 

-OO 

Using definition (2.35), we obtain the series expansion 
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(2.37) 



OO 



F(u) = 

k = 0 



(juf 

k\ 



m k 



Since F(u) is the inverse Fourier transform of the probability density p(x ), it 
can be easy to calculate and can provide the high-order moments of the 
signal. 

The moment of order 4 is used in the definition of the kurtosis K x , or 
coefficient of flatness of a probability distribution 



E[x\n)\ 

E 2 [x\n)\ 



(2.38) 



For example, a binary symmetric distribution (±1 with equal probability) 
leads to K x — 1. For the Gaussian distribution of the next section, K x — 3, 
and for the exponential distribution 

/**) = — (2.39) 
o\/ 2 



K x = 9. 

An important concept is that of statistical independence of random vari- 
ables. Two random variables, X\ and x 2 , are independent if and only if their 
joint density p(x\, x 2 ) is the product of the individual probability densities: 

P(x i , x 2 ) = p(x ] )p(x 2 ) (2.40) 



which implies the same relationship for the characteristic functions: 



OO 

F(ii \ , u 2 ) = j j e^"' x ' +UlX2) p{ x | , x 2 ) dx\dx 2 



(2.41) 



and 



F( w | , u 2 ) = F(u\)F{u 2 ) (2.42) 

The correlation concept is related to linear dependency. Two noncorre- 
lated variables, such that E[x ix 2 ] = 0, have no linear dependency. But, in 
general, that does not mean statistical independency, since higher-order 
dependency can exist. 

Among the probability laws, the Gaussian law has special importance in 
signal processing. 

> 
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2.4. GAUSSIAN SIGNALS 



A random variable x is said to be normally distributed or Gaussian if its 
probability law has a density p(x) which follows the normal or Gaussian 
law: 



p(x) = — ' e- {x - m)2/2 ^ (2.43) 

<T t -\/ 27T 

The parameter m is the mean of the variable x; the variance a 2 is the 
second-order moment of the centered random variable (x — /;?); er A . is also 
called the standard deviation. 

The characteristic function of the centered Gaussian variable is 
F(u ) = e-^ 1 ' 2 (2.44) 

Now, using the series expansion (2.37), the moments are 
m 2k +\ = 0 



2 4 2/d 2k 

m 2 = <T X , m 4 = 3a x , m 2k = (2-45) 

The normal law can be generalized to multidimensional random vari- 
ables. The characteristic function of a ^-dimensional Gaussian variable 



x(x ] , x 2 , . . . , x k ) is 

F(u \ , ■ ■ ■ , u k) — ex P 



J k k 
9 r ij u i l 



i= 1 7=1 



(2.46) 



with r t j — E[XiXj\. 

If the variables are not correlated, then they are independent, because r y 
— 0 for i A j and u 2 , u k ) is the product of the characteristic func- 
tions. So noncorrelation means independence for Gaussian variables. 

A random signal x(n) is said to be Gaussian if, for any set of k time 
values /7,(1 ^ i ^ k ), the A-dimensional random variable .r = [x («i), jc(« 2 ), 
. . . , x(n k )] is Gaussian. According to (2.46), the probability law of that 
variable is completely defined by the ACF r(p) of x(n). The power spectral 
density S( f) is obtained as the Fourier transform of the ACF: 

OO 

S(f) = r(p)e~ j2npf (2.47) 

p=— OO 



or, since r(p) is an even function. 
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(2.48) 



OO 



S(f) — r(0) + 2 ^ r(p) cos(2jrp/) 



p= i 



If the data in the sequence jc(«) are independent, then r(/;) reduces to r(0) and 
the spectrum £(/) is flat; the signal is then said to be white. 

An important aspect of the Gaussian probability laws is that they pre- 
serve their character under any linear operation, such as convolution, filter- 
ing, differentiation, or integration. 

Therefore, if a Gaussian signal is fed to a linear system, the output is also 
Gaussian. Moreover, there is a natural trend toward Gaussian probability 
densities, because of the so-called central limit theorem, which states that the 
random variable 



where the x t are N independent identically distributed (i.i.d.) second-order 
random variables, becomes Gaussian when N grows to infinity. 

The Gaussian approximation can reasonably be made as soon as N 
exceeds a few units, and the importance of Gaussian densities becomes 
apparent because in nature many signal sources and, particularly, noise 
sources at the micro- or macroscopic levels add up to make the sequence 
to be processed. So Gaussian noise is present in virtually every signal pro- 
cessing application. 



In simulation, evaluation, transmission, test, and measurement, the data 
sequences used are often not natural but synthetic signals. They appear 
also in some analysis techniques, namely analysis by synthesis techniques. 

Deterministic signals can be generated in a straightforward manner as 
isolated or recurring pulses or as sums of sinusoids. A diagram to produce a 
single sinusoid is shown in Figure 2.3. Note that the sinusoids in a sum must 
have different phases; otherwise an impulse shape waveform is obtained. 

Flat spectrum signals are characterized by the fact that their energy is 
uniformly distributed over the entire frequency band. Therefore an 
approach to produce a deterministic white-noise-like waveform is to gener- 
ate a set of sinusoids uniformly distributed in frequency with the same 
amplitude but different phases. 

Random signals can be obtained from sequences of statistically indepen- 
dent real numbers generated by standard computer subroutines through a 




(2.49) 



2.5. SYNTHETIC, MOVING AVERAGE, AND 
AUTOREGRESSIVE SIGNALS 
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rounding process. The magnitudes of these numbers are uniformly distrib- 
uted in the interval (0, 1), and the sequences obtained have a flat spectrum. 

Several probability densities can be derived from the uniform distribu- 
tion. Let the Gaussian, Rayleigh, and uniform densities be p(x), p(y), and 
p(z), respectively. The Rayleigh density is 



P(j') = ^exp 
a 




(2.50) 



and the second-order moment of the corresponding random variable is 2er 2 , 
the mean is o^/tt/2, and the variance is (2 — jt/2)a 2 . It is a density associated 
with the peak values of a narrowband Gaussian signal. The changes of 
variables 



p{z) dz — dz — p(y) dy 
leads to 



dz 

dy 





Hence, 



z = exp 




and a Rayleigh sequence y(n) is obtained from a uniform sequence z(n) in 



the magnitude interval (0, 1) by the following operation: 

y(n) = cs^Jl ln[l /z(n)\ (2.51) 

Now, independent Rayleigh and uniform sequences can be used to derive a 
Gaussian sequence x(n): 

x(n) — y(n) cos[2jrz(n)] (2.52) 

In the derivation, a companion variable is introduced: 

x'(n) — y(n) sin 2 nz(n) (2.53) 

Now, let us consider the joint probability p(x, x') and apply the relation 
between rectangular and polar coordinates: 

p(x, x') dx dx' = p(x, x ')y dy dz — p(y)p(z) dy dz (2.54) 

Then 

p(x, x') = ^—p(y) = +x ),2a = p{x)p(x') (2.55) 

iTTV /.1T(T a 
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and finally 



p( x ) — — e ~^ ^ (2-56) 

(TV lit 

The two variables x(n) and x'(ri) have the same distribution and, considered 
jointly, they make a complex Gaussian noise of power 2er. The above 
derivation shows that this complex noise can be represented in terms of 
its modulus, which has a Rayleigh distribution, and its phase, which has a 
uniform distribution. 

Correlated random signals can be obtained by filtering a white sequence 
with either uniform or Gaussian amplitude probability density, as shown in 
Figure 2.5. The filter H(z) can take on different structures, corresponding to 
different models for the output signal [6], 

The simplest type is the finite impulse response (FIR) filter, correspond- 
ing to the so-called moving average (MA) model and defined by 

N 

H(z) = J2 h '- r ''' ( 2 - 57 ) 

/=o 

and, in the time domain, 

N 

x(n) — ^ hje(n — i ) (2.58) 

;=0 

where the /?, are the filter impulse response. 

The output signal ACF is obtained by direct application of definition 
(2.34), considering that 

E[e 2 (n)\ — (Tp, E[e(n)e(n — /')] = 0 for i A 0 

The result is 



Hp) = 



o, 



N-p 

’ E 

1=0 



Up hjh i+p . 



\P\ < N 
\p\ > N 



(2.59) 



e(n) 



white sequence 



H(z) 



x(n) 



correlated sequence 



FIG. 2.5 Generation of a correlated random signal. 
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Several remarks are necessary. First, the ACF has a finite length in 
accordance with the filter impulse response. Second, the output signal 
power a 2 is related to the input signal power by 

N 

oi = r(0) = a 2 e J2ti (2-60) 

i = 0 



Equation (2.60) is frequently used in subsequent sections. The power 
spectrum can be computed from the ACF r(p) by using equation (2.48), 
but another approach is to use H(z), since it is available, via the equation 



S(f) = ^ 



i= 0 



(2.61) 



An infinite impulse response (HR) filter corresponds to an autoregressive 
(AR) model. The equations are 



H(z) = 



1 

N " 
1 - J2 a i z~ i 

i= 1 



(2.62) 



and, in the time domain, 

N 

x{n) — e(n) + ^ a/x(n — i) (2.63) 

(=i 



The ACF can be derived from the corresponding filter impulse response 
coefficients /?,: 

OO 

H(z) = J2 h iZ~‘ (2-64) 

i = 0 



and, accordingly, it is an infinite sequence: 



rip) = o] J2 h t h i+p 

1=0 

The power spectrum is 
S(f) = ■ 



i - J2 °i e 

i= 1 



-j2mf 



(2.65) 



( 2 . 66 ) 



An example is shown in Figure 2.6 for the hlter transfer function: 
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f/f. 



s 
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05 



FIG. 2.6 Spectrum of an AR signal. 



(1 + 0.80r~' + 0.64 z - 2 )(1 - 1.232-' + 0.64z" 2 ) 



Since the spectrum of a real signal is symmetric about the zero frequency, 
only the band [0 ,/ s / 2 ], where f s is the sampling frequency, is represented. 

For MA signals, the direct relation (2.59) has been derived between the 
ACF and filter coefficients. A direct relation can also be obtained here by 
multiplying both sides of the recursion definition (2.63) by x(n — p ) and 
taking the expectation, which leads to 



For p ^ N, the sequence r(p) is generated recursively from the N preceding 
terms. ForO ^ p < N — 1, the above equations establish a linear depen- 
dence between the two sets of filter coefficients and the first ACF values. 

They can be expressed in matrix form to derive the coefficients from the 
ACF terms: 



N 




(2.67) 



N 




( 2 . 68 ) 
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(2.69) 



' r(0) r(l) • 

Ki) Ki) • 


1 




l 

-«i 




lp \?o • 

1 


_r(N) r(N- 1) • 


■ KO) 




_~ a N _ 




_ o _ 



Equation (2.69) is a normal equation, called the order N forward linear 
prediction equation, studied in a later chapter. 

To complete the AR signal analysis, note that the generating filter 
impulse response is 

N 

h p = r(p) - Y a ‘ r ( p + (2-70) 

(=i 

This equation is a direct consequence of definition relations (2.63) and 
(2.64), if we notice that 

h p — E[x(n)e(n — p)\ (2.71) 

Since tip) — r(—p), equation (2.68) shows that the impulse response h p is 
zero for negative p, which reflects the filter causality. 

It is also possible to relate the AC function of an AR signal to the poles of 
the generating filter. 

For complex poles, the filter r-transfer function can be expressed in 
factorized form: 



1 (2-72) 

na-A^xi-^- 1 ) 



Using the equality 



S(f) = crl | H (z)H(z ~ 1 ) | ,- |=1 = 



Y x / fiz p 

p=—oo 



1 - 1=1 



(2.73) 



the series development of the product H(z)H(z~ l ) leads to the AC function 
of the AR signal. The rational function decomposition of H{z)H{z~ x ) yields, 
after simplification, 



N/q 

r(p ) - Y C0S t" Ar g (Pi) + Pi] 

i= 1 



(2.74) 



where the real parameters a,- and Pi are the parameters of the decomposition 
and hence are related to the poles P,. 
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It is worth pointing out that the same expression is obtained for the 
generating filter of the type FIR/IIR, but then the parameters a, and /1,- 
are no longer related to the poles: they are independent. 

A limitation of AR spectra is that they do not take on zero values, 
whereas MA spectra do. So it may be useful to combine both [7]. 



2.6. ARMA SIGNALS 

An ARMA signal is obtained through a filter with a rational r-transfer 
function: 



N i 

E b i z 

H{z) = (2.75) 

1 - E«r 7 "' 

!=1 

In the time domain, 

N N 

x(n) — ^ bje(n — /') + ^ a,x(« — i) (2.76) 

;=o ;=i 



The denominator and numerator polynomials of H(z) can always be 
assumed to have the same order; if necessary, zero coefficients can be added. 
The power spectral density is 



S(f) = ^ 



N z 

/=0 

N 

1 - £ aie-P*? 

i= 1 



(2.77) 



A direct relation between the ACF and the coefficients is obtained by 
multiplying both sides of the time recursion (2.76) by x(n — p ) and taking the 
expectation: 

N N 

r(p ) = a i r (P ~ 0 + XI biE(e(n - i)x(n - />)] (2.78) 

i= 1 !=0 



Now the relationships between ACF and filter coefficients become non- 
linear, due to the second term in (2.78). However, that nonlinear term 
vanishes for p > N because x(n — p) is related to the input signal value 
with the same index and the preceding values only, not future ones. 
Hence, a matrix equation can again be derived involving the AR coefficients 
of the ARMA signal: 
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(2.79) 



r(N) 
r(N+ 1) 


~ ? ■■ 
% hd 


■ r(0)' 

■ KD 




1 

—a x 


o 

II 


1 


_ r(2N) 


r(2N - 1) • 


■ r(N)_ 




_ —a N _ 




_ 0 _ 



For p > N, the sequence r{p) is again generated recursively from the N 
preceding terms. 

The relationship between the first (N + 1) ACF terms and the filter coef- 
ficients can be established through the filter impulse response, whose coeffi- 
cients hi satisfy, by definition, 

OO 

x(n) — X h t e{n — i) (2.80) 

i=0 

Now replacing x(n — /) in (2.76) gives 

TV No o 

x(n) = X M" - 0 + X a < X V(" ~ 1 _ 

1=0 i=l 7=0 

and 



N 



N 



x (n) = X - 0 + XI ~ X aihk ~‘ 



(2.81) 



i= 0 A-l (=1 

Clearly, the impulse response coefficients can be computed recursively: 
h 0 — b 0 , h k — 0 for k < 0 



h k = b k + X a i b k-i’ k ^ 1 



i=i 



In matrix form, for the N + 1 hrst terms we have 



(2.82) 



1 


0 


0 


... o' 




h 0 


0 


0 


0 ' 


— Cl\ 


l 


0 


... 0 




hi 


ho 


0 


0 


—a 2 


-a x 


1 


... 0 




h 2 


hi 


ho 


0 


_ ~ a N 


~ a N-\ 


~ a N- 2 


... 1 




_h N 


hN- 1 


A TV— 2 ■ ■ ■ 


h 0 _ 



bo 


0 


0 


1 

o 


bi 


bo 


0 


0 


b 2 


b i 


bo 


0 


_b N 


^TV-1 


b N _ 2 ■ ■ ■ 


b 0_ 



(2.83) 
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Coming back to the ACF and (2.78), we have 

N N 

b i E [ e ( n ~ iMn - />)] = a) ^ lph,^ 

i= 0 7=0 

and, after simple manipulations, 



N N~P 

<P) = atrip - 0 + a 2 e ^ b j+p hj 
7=0 



Z=1 



Now, introducing the variable 

dip) - J2 b J+P h j 
7=0 

we obtain the matrix equation 





" K0) ' 




' K 0) ' 




' d( 0) " 




Ki) 


+ st' 


K-i) 


2 

= 


d{ 1) 




_r(N)_ 




_r(-N)_ 




d(N)_ 



where 



1 



— «! 1 

— — «a/_i 

0 — flj 

0 —a 2 

0 -a N • ’ 

0 0 



—a N 

■ 0 



0 



(2.84) 



(2.85) 



( 2 . 86 ) 



For real signals, the first (N + 1) ACF terms are obtained from the equation 

(2.87) 



' KO) ' 




" d(0) ' 


Ki) 


— (Tp[.n/ + si'\ 1 


d( 1) 


_r{N)_ 




d(N)_ 



In summary, the procedure to calculate the ACF of an ARMA signal 
from the generating filter coefficients is as follows: 
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1. Compute the first (TV + 1) terms of the filter impulse response through 
recursion (2.82). 

2. Compute the auxiliary variables for 0 ^ p ^ TV. 

3. Compute the first (TV + 1) ACF terms from matrix equation (2.87). 

4. Use recursion (2.68) to derive r(p) when p ^ TV + 1. 

Obviously, finding the ACF is not a simple task, particularly for large 
filter orders TV. Conversely, the filter coefficients and input noise power 
can be retrieved from the ACF. First the AR coefficients a, and the scalar 
b(,b N al can be obtained from matrix equation (2.79). Next, from the time 
domain definition (2.76), the following auxiliary signal can be introduced: 

N N 

u(n) — x(n) — ^ ajX(n — i) — e{n) + ^ bfi{n — i) (2.88) 

7=1 7=1 

where b 0 = 1 is assumed. 

The ACF r u (p) of the auxiliary signal u(n) is derived from the ACF of x(n) 
by the equation 

r„(P) = E[u(n)u(n - p)] 

N N N N 

= r(p) - a ‘ rip + o - X! a ' r( j } _, ')+X!E a i a f(p +j- o 

i= 1 i= 1 i=l 7=1 



or, more concisely by 

N 

c ‘ r(p - 

i=—N 



where 



N 

Ci = c_i, c 0 = 1 + E] aj 
7=1 



N 

Ci — —aj + ^ cijCij—i 

7='+l 



(2.89) 



(2.90) 



But r„(p) can also be expressed in terms of MA coefficients, because of the 
second equation in (2.88). The corresponding expressions, already given in 
the previous section, are 



r u (p) = 



E bib i+p , \p\ ^ N 

i = 0 

0, \p\ > 
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From these N + 1 equations, the input noise power aj and the MA 
coefficients b,{ 1 < i ^ N;b 0 — 1) can be derived from iterative Newton- 
Raphson algorithms. It can be verified that b 0 b N y equals the value we 
previously found when solving matrix equation (2.79) for AR coefficients. 

The spectral density S(f) can be computed with the help of the auxiliary 
signal u(n) by considering the filtering operation 

N 

x(n) — u(n) + ^ ciix(n — i) (2.91) 

i=i 



which, in the spectral domain, corresponds to 



S(f) = 



r k(0) + 2 Y, r u(p ) C0S(2 npf) 
i 

N 2 

1 - £ aie-fr'V 

i= 1 



(2.92) 



This expression is useful in spectral analysis. 

Until now, only real signals have been considered in this section. Similar 
results can be obtained with complex signals by making appropriate com- 
plex conjugations in equations. An important difference is that the ACF is 
no longer symmetrical, which can complicate some procedures. For exam- 
ple, the matrix equation (2.86) to obtain the first ( N + 1) ACF terms 
becomes 



s/r + ss^'r — a 2 d (2.93) 

where r is the correlation vector, r the vector with complex conjugate entries, 
and d the auxiliary variable vector. The conjugate expression of (2.86) is 

s/r + si’r — a^d (2.94) 



The above equations, after some algebraic manipulations, lead to 

[ - d'(ky'sJ’)r = o;[d - stf'(^)~ l d] (2.95) 

Now two matrix inversions are needed to get the correlation vector. Note 
that is readily obtained from (2.83) by calculating the first N + 1 

values of the impulse response of the AR filter through the recursion (2.82). 

Next, more general signals of the types often encountered in control 
systems are introduced. 



Copyright n 2001 by Marcel Dekker, Inc. All Rights Reserved. 




2.7. MARKOV SIGNALS 



Markov signals are produced by state variable systems whose evolution 
from time n to time n + 1 is governed by a constant transition matrix [8]. 

The state of a system of order N at time n is defined by a set of N internal 
variables represented by a vector X(n) called the state vector. The block 
diagram of a typical system is shown in Figure 2.7, and the equations are 

X(n + 1) = AX(n) + Bw(n) 
y(n ) — C‘X(n) + v(n) 

The matrix A is the N x N transition matrix, B is the control vector, and 
C is the observation vector [9]. The input sequence is w(n); v(n) can be a 
measurement noise contaminating the output v{ri). 

The state of the system at time n is obtained from the initial state at time 
zero by the equation 

n 

X(n) = A"X(0) + J2 A n ~'Bw(i - 1) (2.97) 

(=i 

Consequently, the behavior of such a system depends on successive 
powers of the transition matrix A. 

The z-transfer function of the system H{z), obtained by taking the z- 
transform of the state equations, is 

H{z) = CXZI N - A)~ X B (2.98) 

with In the jV x N unity matrix. 

The poles of the transfer function are the values of z for which the 
determinant of the matrix (ZI N — A) is zero. That is also the definition of 
the eigenvalues of A. 




FIG. 2.7 State variable system. 
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The system is stable if and only if the poles are inside the unit circle in the 
complex plane or, equivalently, if and only if the absolute values of the 
eigenvalues are less than unity, which can be seen directly front equation 
(2.97). 

Let us assume that w(n) is centered white noise with power a The state 
variables are also centered, and their covariance matrix can be calculated. 
Multiplying state equation (2.96) on the right by its transpose yields 

X(n + 1 )X\n + 1) = AX(n)X‘(n)A + Bw 2 (n)B l 

+ AX(n)w(n)B' + Bw(n)X t (n)A t 

The expected values of the last two terms of this expression are zero, 
because x(n) depends only on the past input values. Hence, the covariance 
matrix R xx (n + 1) is 

R xx (n + 1) = E\X(n + 1 )X'(n + 1)] = AR xx (n)A‘ + o^BB 1 (2.99) 

It can be computed recursively once the covariance of the initial condi- 
tions R xx ( 0) is known. If the elements of the w(n) sequence are Gaussian 
random variables, the state variables themselves are Gaussian, since they are 
linear combinations of past input values. 

The Markovian representation applies to ARMA signals. Several sets of 
state variables can be envisaged. For example, in linear prediction, a repre- 
sentation corresponding to the following state equations is used: 

x(n) — C r X(n) + e(n) 

( 2 . 100 ) 

X(n) = AX(n - 1) + Be(n - 1) 
with 



A = 


"01 0 
0 0 1 


... o o 
1 


, 5 = 


hi 

h 2 




0 0 o' 

_Ujv U/V-l a N- 2 


. 1 

• Cl\ _ 




_hu _ 





~\- 




Xd(n) 


C — 


0 


, X{fi) = 


x\ (n) 




.0. 




_x N _i (n) 



The elements of vector B are the filter impulse response coefficients of 
equation (2.80), and those of the state vector, x,(«) are the z'-step linear 
predictions of x(n), defined, for the ARMA signal and as shown later, by 
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N-i 



N 



Xi(n ) = a k*( n ~ ^ a i+j X ( n ~ 1 -j) + h i+A n ~ ' 



k= 1 



( 2 . 101 ) 



It can be verified that the characteristic polynomial of the matrix A, 
whose roots are the eigenvalues, is the denominator of the filter transfer 
function H(z) in (2.75). 

Having presented methods for generating signals, we now turn to analysis 
techniques. First we introduce some important definitions and concepts [10]. 

2.8. LINEAR PREDICTION AND INTERPOLATION 

The operation which produces a sequence e(n) from a data sequence x(ri), 
assumed centered and wide-sense stationary, by the convolution 



is called one-step linear prediction error filtering, if the coefficients are cal- 
culated to minimize the variance of the output e(n). The minimization is 
equivalent, through derivation, to making e(n) orthogonal to all previous 
data, because it leads to: 



Since e(ri) is a linear combination of past data, the following equations are 
also valid: 



and the sequence e(«), called the prediction error or the innovation, is a 
white noise. Therefore the one-step prediction error filter is also called the 
whitening filter. The data x(n) can be obtained from the innovations by the 
inverse filter, assumed realizable, which is called the model or innovation 
filter. The operations are shown in Figure 2.8. 

The prediction error variance E a — E[e 2 (n)\ can be calculated from the 
data power spectrum density S(e J "‘) by the conventional expressions for 
digital filtering: 



OO 




( 2 . 102 ) 



E[e(n)x(n — /)] = 0, / ^ 1 



(2.103) 



E[e(n)e(n — /)] = 0, i ^ 1 



(2.104) 



Ea = hf \Ae il °)\ 2 S{e Joi )dw 



(2.105) 



or, in terms of r-transforms. 



E a — [ A(z)A(z x )S(z) — 

j2tt J\ z \ = i z 



(2.106) 
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FIG. 2.8 Linear prediction filter and inverse filter. 



where A(z) is the transfer function of the prediction error filter. The predic- 
tion filter coefficients depend only on the input signal, and the error power 
can be expressed as a function of S( e J "‘) only. To derive that expression, we 
must first show that the prediction error filter is minimum phase; in other 
words, all its zeros are inside or on the unit circle in the complex z-plane. 

Let us assume that a zero of A(z), say z 0 , is outside the unit circle, which 
means |z 0 | > 1, and consider the filter A'(z) given by 



A'(z) = A(z ) 



- — z 0 1 - ~ -0 1 
z — z 0 z — Zq 



As Figure 2.9 shows, 



Z-Zo 1 




Z-Zo 1 


1 


N 

1 

N 

o 


z=e JC0 


Z Zq 


|Z ° |2 



(2.107) 



(2.108) 



and the corresponding error variance is 

K = —*72 E a < E a 

l z ol 



(2.109) 



which contradicts the definition of the prediction filter. Consequently, the 
prediction filter A(z) is minimum phase. 

In (2.106) for E a , we can remove the filter transfer function with the help 
of logarithms, taking into account that the innnovation sequence has a 
constant power spectrum density; thus, 

C dz f _ , dz C dz 

2njlnE a = / ln^(r) b / In A(z ) b / lnS^z) — (2.110) 

J |l|=l Z J |s|=l Z J|z|=l Z 



Now, since A(z) is minimum phase, In A(z) is analytic for |z| ^ 1 and the 
unit circle can be replaced in the above integral with a circle whose radius is 
arbitrarily large, and since 
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FIG. 2.9 Reflection of external zero in the unit circle. 



lim A(z) — a 0 — 1 

z—>o o 



the first integral vanishes on the right side of (2.110). The second integral 
also vanishes because it can be shown, by a change of variables from z' 1 to z 
that it is equal to the first one. 

Finally, the prediction error power is expressed in terms of the signal 
power spectrum density by 



E a = exp 



1 

2tt 



J In S(e '"‘) i/oj J 



( 2 . 111 ) 



This very important result is known as the Kolmogoroff-Szego formula. 
A useful signal parameter is the prediction gain G, defined as the signal- 
to-prediction-error ratio: 



G — f S(e^" > ) do) / exp 
2tt J 



- 1 - [ In S(e jo) )dco 

J —Ti 



( 2 . 112 ) 



Clearly, for a white noise G — 1. 

At this stage, it is interesting to compare linear prediction and interpola- 
tion. Interpolation is the filtering operation which produces from the data 
x(n) the sequence 



e i( n ) = hj x (n - j), h 0 = 1 (2.113) 
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with coefficients calculated to minimize the output power. Hence, e,(n) is 
orthogonal to past and future data: 

E[ei(n)x(n - k)\ - E f 8(k) (2. 1 14) 

where 8(k ) is the Dirac distribution and 

(2.115) 



Clearly, the interpolation error e,(n) is not necessarily a white noise. Taking 
the z-transform of both sides of the orthogonal relationship (2.114) leads to 

//(r).S'(-) = E, (2.116) 

Also 



Ei = ^f H(z)H(z~ l )S(z) — 
fix J\ z \-\ z 

Combining equations (2.116) and (2.117) gives 




Now, it is known from linear prediction that 



S(e Jco ) 

and 



Eg 

M(e>) | 2 



E, 




\A(e JO) )\ 2 dco 




(2.117) 



(2.118) 



(2.119) 



( 2 . 120 ) 



Since a 0 — 1, we can conclude that £,■ ^ E a ; the interpolation error 
power is less than or equal to the prediction error power, which is a not 
unexpected result. 

Linear prediction is useful for classifying signals and, particular, distin- 
guishing between deterministic and random processes. 



2.9. PREDICTABLE SIGNALS 

A signal x(n) is predictable if and only if its prediction error power is null: 
= M( e> )l 2 S(e jm )dco = 0 (2.121) 

or, in the time domain, 
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( 2 . 122 ) 



OO 

x(n) — a,.Tc(n — /) 

/=i 

which means that the present value x(n) of the signal can be expressed in 
terms of its past values. The only signals which satisfy the above equations 
are those whose spectrum consists of lines: 

N 

S(0 = E \ s i\ 2 ^-^) (2-123) 

i= 1 

The scalars S,- are the powers of individual lines. The integer N can be 
arbitrarily large. The minimum degree prediction filter is 

N 

A m (z) = Y\(\-e lm ‘z- v ) (2.124) 

i= 1 

However all the filters A(z) with 

OO 

A(z) = 1 - (2-125) 

1=1 

and such that A(e J “‘) = 0 for 1 < i < N satisfy the definition and are pre- 
diction filters. 

Conversely, since A( r) is a power series, A(e ja> ) cannot equal zero for 
every co in an interval, and equations (2.121) and (2.122) can hold only if 
S(e Ja> ) — 0 everywhere except at a countable set of points. It follows that 5 
(e ja> ) must be a sum of impulses as in (2.123), and A(z ) has corresponding 
zeros on the unit circle. 

Finally, a signal x(n) is predictable if and only if its spectrum consists of 
lines. 

The line spectrum signals are an extreme case of the more general class of 
bandlimited signals. A signal x(n) is said to be bandlimited if S(e ]u> ) = 0 in 
one or more frequency intervals. Then a filter H(co) exists such that 

H{co)S(e JW ) = 0 (2.126) 

and, in the time domain, 

OO 

hjX(n — i) — 0 

i =— oo 

With proper scaling, we have 

OO OO 

x{n) = - E h/x(n — f) — ^ ^ h_iX(n -f - 1 ) (2.127) 

i=l i=l 
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Thus the present value can be expressed in terms of past and future 
values. Again the representation is not unique, because the function //(&>) 
is arbitrary, subject only to condition (2.126). It can be shown that a band- 
limited signal can be approximated arbitrarily closely by a sum involving 
only its past values. Equality is obtained if S(e J0> ) consists of lines only. 

The above sections are mainly intended to serve as a gradual preparation 
for the introduction of one of the most important results in signal analysis, 
the fundamental decomposition. 

2.10. THE FUNDAMENTAL (WOLD) DECOMPOSITION 

Any signal is the sum of two orthogonal components, an AR signal and a 
predictable signal. More specifically: 

Decomposition Theorem 

An arbitrary unpredictable signal x(n) can be written as a sum of two 
orthogonal signals: 



where x p (ri) is predictable and x r (n) is such that its spectrum S r (E JC0 ) can be 
factored as 



and H(z) is a function analytic for l-l > 1- 

The component x r (n) is sometimes said to be regular. Following the 
development in [10], the proof of the theorem begins with the computation 
of the prediction error sequence 



As previously mentioned, the prediction coefficients are computed so as 
to make e(n) orthogonal to all past data values, and the error sequence is a 
white noise with variance E a . 

Conversely, the least squares estimate of x(n) in terms of the sequence e 
(i n ) and its past is the sum 



x(n ) = x p (n) + x r (n) 



(2.128) 




(2.129) 



OO 




(2.130) 



OO 




(2.131) 



i=0 



and the corresponding error signal 
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x p (n) — x(n) — x r (ri) 

is orthogonal to e(n — i) for / > 0. In other words, e(n) is orthogonal to 
x p (n + k) for k ^ 0. 

Now, e(n) is also orthogonal to x r {n — k) for k ^ 1, because x r (n — k ) 
depends linearly on e(n — k) and its past and e(n) is white noise. Hence, 

E[e(n)[x(n — k) — x r (n — A:)]] = 0 = E\e(n)x p (n — k)\, k ^ 1 

and 



E[e(n)x p (n — k)\ — 0, all k 


(2.132) 


Expression (2.131) yields 




E[x r (n)x p (n — k) — 0, all k 


(2.133) 


The signals x,.(n) and x p (n) are orthogonal, and their powers add up to give 
the input signal power: 


E[x 2 (n)\ = E[Xp(n)\ + E[x;.(n)] 


(2.134) 


Now (2.131) also yields 




(X) 

E[x;(n)] = E a J2tf < C[x 2 (»)] 

i= 0 


(2.135) 


Therefore, 




H 

iM 8 




converges for r > 1 and defines a linear causal system which produces x r (n) 
when fed with e(ri). 

In these conditions, the power spectrum of x r (n) is 


S,V“) = E a \H(en \ 2 


(2.136) 



The hltering operations which have produced x r (n) from x(n) are shown 
in Figure 2.10. If instead of x(n) the component in a signal sequence 
x(n) — x r (n) — x p (ri) is fed to the system, the error e p (n), instead of e(n), is 
obtained. The sequence 



e p (n) = e(n) - 



OO 

x r (n) — diX r (n — 0 
i=i 



(2.137) 



is a linear combination of e(n) and its past, via equation (2.131). But, by 
definition, 
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FIG. 2.10 Extraction of the regular component in a signal. 



e p (n) = x p (n) - Y ai x P (" ~ 0 



1=1 



which, using equations (2.132) and (2.133), yields 



E[e~„(n)\ = E\ 



= 0 



e(n) - \x r (n) - Y a i x M ~ 0 ] 

OO 

x p (n) - Y a i x p( n - 0 



i=l 



(2.138) 



Therefore x p (n) is a predictable signal and the whitening hlter A(z) is a 
prediction error filter, although not necessarily the minimum degree hlter, 
which is given by (2.124). On the contrary, A( r) is the unique prediction 
error hlter of x(n). 

Finally, the spectrum S(e Ja> ) of the unpredictable signal x(n) is a sum 

S(e Jay ) = S r (e ja> ) + S p (e ja> ) (2. 1 39) 

where S r (e ]CO ) is the continuous spectrum of the regular signal x r (ri), and 
S p (e J, ’‘) is the line spectrum of the deterministic component, the two com- 
ponents being uncorrelated. 



2.11. HARMONIC DECOMPOSITION 

The fundamental decomposition is used in signal analysis as a reference for 
selecting a strategy [1 1], As an illustration let us consider the case, frequently 
occurring in practice, where the signal to be analyzed is given as a set of 2 
N + 1 autocorrelation coefficients r(p) with —A ^ p ^ N, available from a 
measuring procedure. To perform the analysis, we have two extreme 
hypotheses. The first one consists of assuming that the signal has no deter- 
ministic component; then a set of N prediction coefficients can be calculated 
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as indicated in the section dealing with AR signals by (2.69), and the power 
spectrum is obtained from (2.66). 

But another hypothesis is that the signal is essentially deterministic and 
consists of N sinusoids in noise. The associated ACF for real data is 

N 

rip) = 2 l-Sjtl ’ cos (pa> k ) + o*8(p) (2. 140) 

k= 1 

where a> k are the radial frequencies of the sinusoids and S k are the ampli- 
tudes. In matrix form. 



r(0) - a] 




1 


1 


1 


K 1) 




cos coi 


COS 0 ) 2 


• COS COjy 


K 2) 


= 2 


cos 2u>\ 


COS 2 (JL>2 


cos 2 com 


_ r (N) _ 




COS N(l>\ 


COS Nco 2 • 


■ cos Nco n 



l^il 2 

\S 2 1 2 



_\S N \ 2 _ 



(2.141) 



The analysis of the signal consists of finding out the sinusoid frequencies 
and amplitudes and the noise power a 2 . To perform that task, we use the 
signal sequence x(n ) . According to the above hypothesis, it can be expressed 
by 



x(n) = x p {ri) + e(n) 
with 



(2.142) 



N 

x p (n ) = Y a i x p( n ~ 0 

i=t 

Now, the data signal satisfies the recursion 

N N 

x(n) — dix{n — i) + e(n) — ^ a pin — i) (2. 143) 

i=i 1=1 

which is just a special kind of ARMA signal, with b 0 = 1 and b ,■ = —a t in 
time domain relation (2.76). Therefore results derived in Section 2.6 can be 
applied. 
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The impulse response can be computed recursively, and relations (2.82) 
yield h k — 8(k). The auxiliary variable in (2.85) is d(p) — —a p ( 1 ^ p ^ N). 
Rewriting the equations giving the autocorrelation values (2.84) leads to 

N 

r{p) — ^2 a L r(p - i) + er j(-a p ), 1 < p < N (2.144) 

i= 1 

or, in matrix form for real data, 



' k o) f(i) • 


■ r(N) 




1 




l 


Ki) m ■ 


■ r(N - 1) 




-a, 


2 


— Cl\ 


r(N) r(N — 1 ) • 


■ r(0) 




a N _ 


= °e 


_ ~ a N _ 



This is an eigenvalue equation. The signal autocorrelation matrix is sym- 
metric, and therefore all eigenvalues are greater than or equal to zero. For N 
sinusoids without noise, the (N + 1) x (N + 1) autocorrelation matrix has 
one eigenvalue equal to zero; adding to the signal a white noise component 
of power ajr results in adding a] to all eigenvalues of the autocorrelation 
matrix. Thus, the noise power a~ is the smallest eigenvalue of the signal, and 
the recursion coefficients are the entries of the associated eigenvector. As 
shown in the next chapter, the roots of the filter 

N 

A(z) = 1 - (2.146) 

i=i 

called the minimum eigenvalue filter, are located on the unit circle in the 
complex plane and give the frequencies of the sinusoids. The analysis is then 
completed by solving the linear system (2.141) for the individual sinusoid 
powers. The complete procedure, called the Pisarenko method, is presented 
in more detail in a subsequent chapter [12]. 

So, it is very important to notice that a signal given by a limited set of 
correlation coefficients can always be viewed as a set of sinusoids in noise. 
That explains why the study of sinusoids in noise is so important for signal 
analysis and, more generally, for processing. 

In practice, the selection of an analysis strategy is guided by a priori 
information on the signal and its generation process. 



2.12. MULTIDIMENSIONAL SIGNALS 

Most of the algorithms and analysis techniques presented in this book are 
for monodimensional real or complex sequences, which make up the bulk of 
the applications. However, the extension to multidimensional signals can be 
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quite straightforward and useful in some important cases — for example, 
those involving multiple sources and receivers, as in geophysics, underwater 
acoustics, and multiple-antenna transmission systems [13]. 

A multidimensional signal is defined as a vector of N sequences 



X(n) = 



*1 («) 
*2 (n) 



x N (n) 



For example, the source and receiver vectors in Figure 1.1 are multidimen- 
sional signals. The N sequences are assumed to be dependent; otherwise they 
could be treated as N different scalar signals. They are characterized by the 
joint density function between them. 

A second-order stationary multidimensional random signal is character- 
ized by a mean vector M x and a covariance matrix R xx : 



M x = 



E[x x {ri)\ 

E\x 2 {n)\ 



E[x N (n)] 



R xx = E[(X(n) - M x )(X(n) - Mj] 



(2.147) 



The diagonal terms of R xx are the variances of the signal elements. If the 
elements in the vector are each Gaussian, then they are jointly Gaussian and 
have a joint density: 

P(X] = {2jr) N/ 2[ te tR ji/z ^Pt- 2 ( r - M x )'R- l x (X - M x )\ (2. 148) 
For the special case N — 2, 



Rxx = 



v . rr Y , 
2 



(2.149) 



with p the correlation coefficient defined by 
1 



P = 



E[(x\ - m x )(x 2 - m 2 )\ 



(2.150) 



If the signal elements are independent, R xx is a diagonal matrix and 



^=n^V ex p 

t=i ofv2jr 



(Xj ~ /»,-) 

2a} 



(2.151) 



Furthermore, if all the variances are equal, then 
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R xx = o 2 I n (2.152) 

This situation is frequently encountered in roundoff noise analysis in imple- 
mentations. 

For complex data, the Gaussian joint density (2.148) takes a slightly 
different form: 

p(X) = TtV ex P[-( X - M *T R xl ( x - M *)] (2. 1 53) 

7 x det R xx 

Multidimensional signals appear naturally in state variable systems, as 
shown in Section 2.7. 



2.13. NONSTATIONARY SIGNALS 



A signal is nonstationary if its statistical character changes with time. The 
fundamental decomposition can be extended to such a signal, and the reg- 
ular component is 

OO 

x r (n) — ^ hj(n)e(n — i) (2.154) 

i= 0 



where e{n) is a stationary white noise. The generating filter impulse response 
coefficients are time dependent. An instantaneous spectrum can be defined 
as 



S(f, n) = a 2 e 



OO 

Y,hi{n)e~ i2nfi 
i= 0 



(2.155) 



So, nonstationary signals can be generated or modeled by the techniques 
developed for stationary signals, but with additional means to make the 
system coefficients time varying [14]. For example, the ARM A signal is 

N N 

x(n) — ^ bj(n)e(n — i ) + ^ a,(«).v(« — i) (2.156) 

i=0 (=1 



The coefficients can be generated in various ways. For example, they can 
be produced as weighted sums of K given time functions fk(n): 

K 

a i(n) = '%2 a ikf k (n) (2.157) 

k= 1 



These time functions may be periodic functions or polynomials; a simple 
case is the one-degree polynomial, which corresponds to a drift of the coef- 
ficients. The signal depends on (2 N + 1 )K time-independent parameters. 



> 
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The set of coefficients can also be a multidimensional signal. A realistic 
example in that class is shown in Figure 2.11. The N time-varying filter 
coefficients a,(«) are obtained as the outputs of N fixed-coefficient filters 
fed by independent white noises with same variances. A typical choice for 
the coefficient filter transfer function is the first-order low-pass function 

H,(z) = , 0 « y < 1 (2.158) 

1 — yz 

whose time constant is 



r = 




(2.159) 



For y close to unity, the time constant is large and the filter coefficients are 
subject to slow variations. 

The analysis of nonstationary signals is complicated because the ergodi- 
city assumption can no longer be used and statistical parameters cannot be 
computed through time averages. Natural signals are nonstationary. 
Flowever, they are often slowly time varying and can then be assumed 
stationary for short periods of time. 



2.14. NATURAL SIGNALS 

To illustrate the preceding developments, we give several signals from dif- 
ferent application fields in this section. 

Speech is probably the most commonly processed natural signal through 
digital communication networks. The waveform for the word “FATHER” 
is shown in Figure 2.12. The sampling rate is 8 kHz, and the duration is 



e,(n) e N (n) 




FIG. 2.11 Generation of a nonstationary signal. 
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Amplitude 




Time (ms) 



200 



400 



FIG. 2.12 Speech waveform for the word “father.” 

about 0.5 s. Clearly, it is nonstationary. Speech consists of phonemes and 
can be considered as stationary on durations ranging from 10 to 25 ms. 

It can be modeled as the output of a time-varying purely recursive filter 
(AR model) fed by either a string of periodic pulses for voiced sections or a 
string of random pulses for unvoiced sections [15]. 

The output of the demodulator of a frequency-modulated continuous 
wave (FMCW) radar is shown in Figure 2.13. It is basically a distorted 
sinusoid corrupted by noise and echoes. The main component frequency 
is representative of the distance to be measured. 

An image can be represented as a one-dimensional signal through scan- 
ning. In Figure 2.14, three lines of a black-and-white contrasted picture are 
shown; a line has 256 samples. The similarities between consecutive lines can 
be observed, and the amplitude varies quickly within every line. The picture 
represents a house. 



2.15. SUMMARY 

Any stationary signal can be decomposed into periodic and random com- 
ponents. The characteristics of both classes can be studied by considering 
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Amplitude 



+ 1 




FIG. 2.13 FMCW radar signal. 



as main parameters, the ACF, the spectrum, and the generating model. 
Periodic signals have been analyzed first. Then random signals have been 
defined, with attention being focused on wide-sense stationary signals; 
they have second-order statistics which are independent of time. 
Synthetic random signals can be generated by a filter fed with white 
noise. The Gaussian amplitude distribution is especially important 
because of its nice statistical properties, but also because it is a model 
adequate for many real situations. The generating filter structures corre- 
spond to various output signal classes: MA, AR, and ARMA. The con- 
cept of linear prediction is related to a generating filter model, and the 
class of predictable signals has been defined. A proof of the fundamental 
Wold decomposition has been presented, and, as an application, it has 
been shown that a signal specified by a limited set of correlation coeffi- 
cients can be viewed as a set of sinusoids in noise. That is the harmonic 
decomposition. 

In practice, signals are nonstationary, and, in general, short-term statio- 
narity or slow variations have to be assumed. Several natural signal exam- 
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Sample number 

FIG. 2.14 Image signal: three lines of a black-and-white picture. 



white 



gray 



black 



pies, namely speech, radar, and image samples, have been selected to illus- 
trate the theory. 



EXERCISES 

1. Calculate the z-transform F^(z) of the damped cosinusoid 

f 0, n < 0 

^(«)=J e -o-i» cos ^ i „^ 0 

and show the poles in the complex plane. 

Give the signal energy spectrum and verify the energy relationship 

00 , * 

E y = AW = — / Y r (z) Y r {z~ 1 )z" 1 dz 
■ to l-1=i 

Give the coefficients, initial conditions, and diagram of the second- 
order section which generates y R (n). 

2. Find the ACF of the signal 

ir 1 . ir 
x(n) — cos 77— + - sin n — 

Determine the recurrence equation satisfied by x(n) and give the initial 
conditions. 



Copyright n 2001 by Marcel Dekker, Inc. All Rights Reserved. 




3. Evaluate the mean and variance associated with the uniform probability 
density function on the interval [x ] , x 2 \- Comment on the results. 

4. Consider the signal 



x(n) — 



0 n < 0 

0.8.Y (n — 1 ) + e(n), n ^ 1 



assuming e(n) is a stationary zero mean random sequence with power 
a] — 0.5. The initial condition is deterministic with value x(0) = 1. 

Calculate the mean sequence m n — E[x{n)\. Give the recursion, for 
the variance sequence. What is the stationary solution. Calculate the 
ACF of the stationary signal. 

5. Find the first three terms of the ACF of the AR signal. 
x(n) = \.21x(n — 1) — 0.81 x(n — 2) + e(n) 



where e(n) is a unit power centered white noise. 

6. An ARMA signal is defined by the recursion 

x(n) — e(n) + 0.5 e(n — 1 ) + 0.9 e(n — 2) + x{n — 1) — 0.5x(« — 2) 



where e(n) is a unit variance centered white noise. Calculate the gener- 
ating filter z-transfer function and its impulse response. Derive the 
signal ACF. 

7. A two-dimensional signal is defined by 



X(n) = 



x\ in) 
x 2 (n) 




0.63 

0.09 



0.36 

0.86 



X(n- 1) + 



0.01 

0.06 



n ^ 0 

e(n), n ^ 1 



8 . 



9. 



where e(n) is a unit power centered white noise. Find the covariance 
propagation equation and calculate the stationary solution. 

A measurement has supplied the signal autocorrelation values 
/■(O) — 5.75, /•( 1 ) = 4.03, r(2) = 0.46. Calculate the two coefficients of 
the second-order linear predictor and the prediction error power. 
Give the corresponding signal power spectrum. 



Find the 


eigenvalues 


of the 




'1.00 


0.70 


0.08 


*3 = 


0.70 


1.00 


0.70 




0.08 


0.70 


1.00 



and the coefficients of the minimum eigenvalue filter. Focate the zeros 
of that filter and give the harmonic spectrum. Compare with the pre- 
diction spectrum obtained in the previous exercise. 
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3 

Correlation Function and Matrix 



The operation and performance of adaptive filters are tightly related to the 
statistical parameters of the signals involved. Among these parameters, the 
correlation functions take a significant place. In fact, they are crucial 
because of their own value for signal analysis but also because their terms 
are used to form correlation matrices. These matrices are exploited directly 
in some analysis techniques. However, in the efficient algorithms for adap- 
tive filtering considered here, they do not, in general, really show up, but 
they are implied and actually govern the efficiency of the processing. 
Therefore an in-depth knowledge of their properties is necessary. 
Unfortunately it is not easy to figure out their characteristics and establish 
relations with more accessible and familiar signal features, such as the spec- 
trum. 

This chapter presents correlation functions and matrices, discusses their 
most useful properties, and, through examples and applications, makes the 
reader accustomed to them and ready to exploit them. To begin with, the 
correlation functions, which have already been introduced, are presented in 
more detail. 



3.1. CROSS-CORRELATION AND 
AUTOCORRELATION 

Assume that two sets of N real data, x(n) and y(n), have to be compared, 
and consider the scalar a which minimizes the cost function 



> 
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(3.1) 



N 

J(N) — y^[y(n) — ax(n)] 2 

n= 1 

Setting to zero the derivative of J(N ) with respect to a yields 

N 

E x(n)y(n ) 

n= 1 

a = — 

E x2 ( n ) 

n= 1 

The minimum of the cost function is 

N 

= [1 - ^(N)] J> 2 («) 

n= 1 

with 



(3.2) 



(3.3) 



N 

E x(n)y(n ) 

£(A0 = n=1 = (3.4) 

/V /V 

/ ^ X' 2 («) / E V 2 (n) 

V n=l V " =l 

The quantity /r(A^), cross-correlation coefficient, is a measure of the degree 
of similarity between the two sets of N data. To point out the practical 
significance of that coefficient, we mention that it is the basic parameter 
of an important class of prediction filters and adaptive systems — the least 
squares (LS) lattice structures in which it is computed in real time recur- 
sively. 

From equations (3.2) and (3.4), the correlation coefficient k(N) is 
bounded by 

1*001 < 1 (3.5) 

and it is independent of the signal energies; it is said to be normalized. 

If instead of x(n) we consider a delayed version of the signal in the above 
derivation, a cross-correlation function can be obtained. The general, 
unnormalized form of the cross-correlation function between two real 
sequences x(n) and y(n) is defined by 

r yx {p) = E[y(n)x(n - p)\ (3.6) 

For stationary and ergodic signals we have 

1 N 

r vxip) = Jim . , , V y{n)x{n - p) (3.7) 

TV^oo 2N + 1 n ^ N 
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Several properties result from the above definitions. For example: 

r yx (~p) = E{x(n + p)y[(n + p) - p]) = r xy (p) (3.8) 

If two random zero mean signals are independent, their cross-correlation 
functions are zero. In any case, when p approaches infinity the cross-corre- 
lation approaches zero. The magnitudes of r ip) are not, in general, max- 
imum at the origin, but they are bounded. The inequality 

[y(n)-x(n-p)] 2 >0 (3.9) 

yields the bound 

\r yx (p)\ < j[r xx (0) + r yy (0)] (3.10) 

If the signals involved are the input and output of a filter 

OO 

yin) = hiX{n - i) (3.11) 

/=0 

and 



r yx (p) = E[ y(ri)x(n - p)] = ^ h,r xx (p - i) (3.1 2) 

;=o 

the following relationships, in which the convolution operator is denoted *, 
can be derived: 

r yx (p) = r xx (p) * h{p) 

r xy (p ) = r xx (p) * K~P ) (3-13) 

r yy (p) = r xx (p) * h(p) * h(-p) 

When y(n) — x(n), the autocorrelation function (ACF) is obtained; it is 
denoted r xx (p) or, more simply, r(p ), if there is no ambiguity. The following 
properties hold: 

r(p) = r {-p), \r{p)\ < r(0) (3.14) 

For x{n) a zero mean white noise with power al, 

r(p) = a\m (3.15) 

and for a sine wave with amplitude S and radial frequency co Q , 

S 2 

rip) = —cospw 0 (3.16) 

The ACF is periodic with the same period. Note that from (3.15) and (3.16) 
a simple and efficient noise-elimination technique can be worked out to 
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retrieve periodic components, by just dropping the terms r{p) for small p in 
the noisy signal ACF. 

The Fourier transform of the ACF is the signal spectrum. For the cross- 
correlation r yx (p) it is the cross spectrum S yx (f). 

Considering the Fourier transform X (/) and Y (/) of the sequences x(n) 
and y(ri), equation (3.7) yields 

S yx (f) = Y(f)X(f) (3.17) 

where X (/') is the complex conjugate of X(f). 

The frequency domain correspondence for the set of relationships (3.13) 
is found by introduction of the filter transfer function: 



H(f) = 



Y(f ) 
X(f) 



YUVOf) 

\X(f)\ 2 



(3.18) 



Now 



S yx (f) - S xx (f)H(f) 

S X y(f) = S xx (f)H(f) (3.19) 

Syyif) - S xx (f)\H(f )\ 2 



The spectra and cross spectra can be used to compute ACF and cross- 
correlation function, through Fourier series development, although it is 
often the other way round in practice. 

Most of the above definitions and properties can be extended to complex 
signals. In that case the cross-correlation function (3.6) becomes 

r yx (p) = E[ y(n)x(n - p)] (3.20) 



In the preceding chapter the relations between correlation functions and 
model coefficients have been established for MA, AR, and ARMA station- 
ary signals. In practice, the correlation coefficients must be estimated from 
available data. 



3.2. ESTIMATION OF CORRELATION FUNCTIONS 

The signal data may be available as a finite-length sequence or as an infinite 
sequence, as for stationary signals. In any case, due to the limitations in 
processing means, the estimations have to be restricted to a finite time 
window. Therefore a finite set of N 0 data is assumed to be used in estima- 
tions. 

A first method to estimate the ACF r(p) is to calculate r x (p) by 
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(3.21) 



j N 0 

r \(p) — "jrf E x(n)x(n-p) 

* o n=P +i 

The estimator is biased because 

E[r l (pj\ = 1 ^ L r(p) (3.22) 

tv o 

However, the bias approaches zero as N 0 approaches infinity, and r\ (p) is 
asymptotically unbiased. 

An unbiased estimator is 



r 2 (p) = 



1 

N 0 -p 



E 



n = p + 1 



x{ri)x{n — p) 



(3.23) 



In order to limit the range of the estimations, which are exploited sub- 
sequently, we introduce a normalized form, given for the unbiased estimator 
by 



No 

E x{n)x(n - p) 



r nl{p) = — 



n = p + 1 

No N 0 

J2(„\ JL( 



ni/2 



£ *00 £ x\n-p) 

«=/?+ 1 n = p + 1 



(3.24) 



The variance is 



var {r„ 2 (p)} = E[r 2 n2 (p)\ - E 2 [r n2 [p)\ (3.25) 

and it is not easily evaluated in the general case because of the nonlinear 
functions involved. However, a linearization method, based on the first 
derivatives of Taylor expansions, can be applied [1]. For uncorrelated 
pairs in equation (3.24), we obtain 

var {r n2 (p)} ^ — (3.26) 

7V 0 — P 



r n(P) = 



£[v(«)x(n — />)] 
[F[.x 2 («)]F[.y 2 (« -/;)]] 1/2 



(3.27) 



is the theoretical normalized ACF. 

Thus, the variance also approaches zero as the number of samples 
approaches infinity, and r n2 (p ) is a consistent estimate. 

The calculation of the estimator according to (3.24) is a demanding 
operation for large N 0 . In a number of applications, like radiocommunica- 
tions, the correlation calculation may be the first processing operation, and 



Copyright n 2001 by Marcel Dekker, Inc. All Rights Reserved. 




it has to be carried out on high-speed data. Therefore it is useful to have less 
costly methods available. Such methods exist for Gaussian random signals, 
and they can be applied as well to many other signals. 

The following property is valid for a zero mean Gaussian signal x(n): 

r(p) =*_r yx (p)r yx ( 0) (3.28) 

where 

y(n) - sign{x(«)}, y(n) = ±1 
Hence the ACF estimate is 

j N 0 

G (p) = c— V x(n - p)sign{x(n)} (3.29) 

where 



N„ 



= 2 r ’ M = wX' xm 

u n= 1 



In normalized form, we have 



r„i(p) = 



No 

E x(n - p)sign{x(n)} 

Nq n=p+l 



No — p 



N 0 

E W«)l 



n= 1 



(3.29a) 



A multiplication-free estimate is obtained [2], which is sometimes called 
the hybrid sign correlation or relay correlation. For uncorrelated pairs and p 
small with respect to N 0 , the variance is approximately [3] 



var {r n3 (p)} 



1 

No 



Tt 



2r„(p)Aicsm[r„(p)\ + -r„(p) - 2 r;,(p)J 1 - r;,(p) 



(3.30) 



This estimator is also consistent. 

The simplification process can be carried one step further, through the 
polarity coincidence technique, which relies on the following property of 
zero mean Gaussian signals: 



[ 7 T 

— E[s\gn{x(n)x(n - 1)}] 



(3.31) 



The property reflects the fact that a Gaussian function is determined by its 
zero crossings, except for a constant factor. Hence we have the simple 
estimate 
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(3.32) 



r„ 4 (p) = sin 



r 71 



2N< 



X! sign{.\-(«)x(« -/>)} 



E n=p + 1 



which is called the sign or polarity coincidence correlator. Its variance can 
be approximated for N 0 large by [4] 



var [r n4 (p)} 



1 : t 



[1 - <{p)\ 



1 



-Arcsin r{p ) 



(3.33) 



In a Gaussian context, a more precise estimator is based on the mean of 
the absolute differences. Consider the sequence 

z p (n) — x(n) — x(n — p) (3.34) 

Its variance is 



E[z 2 p (n)\ = 2[/ (0) - r(p)\ = 2/ (0) 1 
and, 

r(p) _ , 1 E[z 2 p (n) 

HO) ~ 2 ;-o 



r(p) 

m. 



(3.35) 



(3.36) 



Using the Gaussian assumption and equation (3.28), an estimator is 
obtained as 



r*s(p) = 1 - o 



No 

\x(n) - x(n — p)\ 

n=p 



£(W»)I + \x{n-p)\) 

n=p 



(3.37) 



The variances of the three normalized estimators r n2 , r n3 , and r n4 are 
shown in Figure 3.1 versus the theoretical autocorrelation (AC) ?'(/>). 
Clearly the lower computational cost of the hybrid sign and polarity coin- 
cidence correlators is paid for by a lower accuracy. As concerns the estima- 
tor r n 5 , it has the smallest variance and is closer to the theory [6]. 

The performance evaluation of the estimators has been carried out under 
the assumption of uncorrelated sample pairs, which is no longer valid when 
the estimate is extracted on the basis of a single realization of a correlated 
process, i.e., a single data record. The evaluation can be carried out by 
considering the correlation between pairs of samples; it shows a degradation 
in performance [5]. 

For example, if the sequence x(n) is a bandlimited noise with bandwidth 
B , the following bound can be derived for a large number of data N 0 [7]: 
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1.6 




FIG. 3.1 Standard deviation of estimators versus theoretical autocorrelation for 
large number of data N. 

var| '-*» * mh l3J8) 

The worst case occurs when the bandwidth B is half the sampling fre- 
quency; then x (n) is a white noise, and the data are independent, which leads 
to 



var {r 2 (p)} 



2r 2 (0) 

Nq-p 



(3.39) 



This bound is compatible with estimation (3.26). Anyway the estimator for 
correlated data is still consistent for fixed p. 

Furthermore, the Gaussian hypothesis is also needed for the hybrid sign 
and polarity coincidence estimators. So, these estimators have to be used 
with care in practice. An example of performance comparison is presented in 
Figure 3.2 for a speech sentence of 1.25 s corresponding to N 0 — 10,000 
samples. 

In spite of noticeable differences between conventional and polarity coin- 
cidence estimators for small AC values, the general shape of the function is 
the same for both. 

> 
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Concerning correlated data, an important aspect of simplified correlators 
applied to real-life data is that they may attenuate or even cancel small 
useful components. Therefore, if small critical components in the signal 
have to be kept, the correlation operation accuracy in equipment must be 
determined to ensure that they are kept. Otherwise, reduced word lengths, 
such as 8 bits or 4 bits or even less, can be employed. 

The first estimator introduced, r { (p), is just a weighted version of r 2 (p); 
hence its variance is 



varjr^)} = var 



Np ~P 
N 0 







(3.40) 



The estimator r\(p) is biased, but it has a smaller variance than r 2 (p). It is 
widely used in practice. 

The above estimation techniques can be expanded to complex signals, 
using definition (3.20). For example, the hybrid complex estimator, the 
counterpart of r 2 (j)) in (3.29), is defined by 

'h cip) = | f yxc (0)r yxc (p) (3.41) 



with 



ryxc(p) = ^'t e ~ Km ~ l)7l/2 T l x W 

^ ™ = 1 l m 

where the summation domain itself is defined by 
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1 ^ n sS N 0 - p 



The sign function has been replaced by a phase discretization operator that 
uses the signs of the real components. This computationally efficient esti- 
mator is accurate for the complex Gaussian stationary processes [8]. 

So far, stationarity has been assumed. However, when the signal is just 
short-term stationary, the estimation has to be carried out on a compatible 
short-time window. An updated estimation is obtained every time if the 
window slides on the time axis; it is a sliding window technique, in which 
the oldest datum is discarded as a new datum enters the summation. 

An alternative, more convenient, and widely used approach is recursive 
estimation. 

3.3. RECURSIVE ESTIMATION 

The time window estimation, according to (3.21) or (3.23), is a finite impulse 
response (FIR) filtering, which can be approximated by an infinite impulse 
response (HR) filtering method. The simplest HR filter is the first-order low- 
pass section, defined by 

y(n) — x(ri) + by(n — 1), 0 < b < 1 (3.42) 

Before investigating the properties of the recursive estimator, let us con- 
sider the simple case where the input sequence x{n ) is the sum of a constant 
m and a zeor mean white noise e(n) with po wera,: . Furthermore, if y(n) — 0 
for n < 0, then 



Therefore, an estimation of the input mean m is provided by the product 
(1 — b)y(n), that is by the first-order section with r-transfer function: 




(3.43) 



Taking the expectation gives 
„ , „ 1 - b n+l 



E\ y{n)\ — m 



(3.44) 




(3.45) 



The noise power al at the output of such a filter is 




(3.46) 
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Consequently, the input noise is all the more attenuated than b is close to 
unity. Taking 6=1— <5, 0 < 5 <§C 1 yields 




The diagram of the recursive estimator is shown in Figure 3.3. The corre- 
sponding recursive equation is 

M(n) = (1 - S)M(n - 1) + Sx(n) (3.48) 



According to equation (3.44) the estimation is biased and the duration 
needed to reach a good estimation is inversely proportional to S. In digital 
filter theory, a time constant r can be defined by 

e~ l/T = b (3.49) 



which for b close to 1, leads to 
1 _ i 

XKi 1 -b~S 



(3.50) 



In order to relate recursive and window estimations, we define an equiva- 
lence. The FIR estimator 



j A'o-l 

x«) = tt x( - n - z ’) 

"o /— o 



which is unbiased, yields the output noise power 

2 



(tfo ) 2 = 



An 



Comparing with (3.47), we get 
2t*N 0 



(3.51) 



(3.52) 

(3.53) 




FIG. 3.3 Recursive estimator. 
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The recursive estimator can be considered equivalent to a window esti- 
mator whose width is twice the time constant. 

For example, consider the recursive estimation of the power of a white 
Gaussian signal x(n), the true value being a\. The input to the recursive 
estimator, x 2 (n), can be viewed as the sum of the constant m — a\ and a zero 
mean white noise, with variance 

a] = e[x 4 (n)\ - a 4 x = 2a 4 (3.54) 

The standard deviation of the output, A P, is 

A P = alV8 (3.55) 

and the relative error on the estimated power is Vs. 

Recursive estimation techniques can be applied to the ACF and to cross- 
correlation coefficients; a typical example is the lattice adaptive filter. 

Once the ACF has been estimated, it can be used for analysis or any 
further processing. 



3.4. THE AUTOCORRELATION MATRIX 



Often in signal analysis or adaptive filtering, the ACF appears under the 
form of a square matrix, called the autocorrelation matrix. 

The N x N AC matrix R xx of the real sequence x(n) is defined by 



r( 0) r(l) ■■■ r(N - 1) 

r(l) r(0) ... ■■■ r(N- 2) 



r(N - 1) r(N - 2) ■ ■ ■ r(0) 



(3.56) 



ft is a symmetric matrix and R xx — R xx . For complex data the definition is 
slightly different: 



K0) r( 1) ■■■ r(N — 1 ) 

K-l) KO)-... ■■■ r(N- 2) 

r[-(N- 1)] /[-(AC - 2)] V(0) 



(3.57) 



Since r(—p) is the complex conjugate of r(p), the matrix is Flermitian; that is, 
R* xx = R xx (3.58) 



where “*” denotes transposition and complex conjugation. 

To illustrate how naturally the AC matrix appears, let us consider an FIR 
filtering operating with N coefficients: 
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(3.59) 



N - 1 

y(n) — ^ hjx(n — i ) 

i=0 

In vector notation (3.58) is 
y(n) = //'Z(«) = X\n)H 
The output power is 

£[/(«)] = E[H'X{n)X\n)H] = H 1 R XX H (3.60) 

The inequality 

H'R XX H ^ 0 (3.61) 

is valid for any coefficient vector H and characterizes positive semidefinite 
or nonnegative definite matrices [9]. A matrix is positive definite if 

H 1 R XX H > 0 (3.62) 

The matrix R xx is also symmetrical about the secondary diagonal; hence 
it is said to be doubly symmetric or persymmetric. Define by J N the N x N 
co-identity matrix, which acts as a reversing operator on vectors and shares 
a number of properties with the identity matrix I N '- 





-1 


0 ■ 


■ 0 


0- 




-0 


0 ■ 


■ 0 


r 




0 


1 ■ 


■ 0 


0 




0 


0 ■ 


• 1 


0 


In — 


0 


0 ■ 


• 1 


0 


> Jn — 


0 


1 ■ 


■ 0 


0 




.0 


0 ■ 


■ 0 


1_ 




_ 1 


0 ■ 


■ 0 


0_ 



The double symmetry property is expressed by 

R X x-In — •A Rxx (3.64) 

Autocorrelation matrices have an additional property with respect to 
doubly symmetric matrices, namely their diagonal entries are identical; 
they are said to have a Toeplitz form or, in short, to be Toeplitz. This 
property is crucial and leads to drastic simplifications in some operations 
and particularly the inverse calculation, needed in the normal equations 
introduced in Section 1.4, for example. Examples of AC matrices can be 
given for MA and AR signals. If x(ri) is an MA signal, generated by filtering 
a white noise with power o 2 by an FIR filter having P < N/2 coefficients, 
then R xx is a band matrix. For P — 2, 

x(n) — h 0 e(n) + h { e{n — 1) (3.65) 

Using the results of Section 2.5 yields 
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(3.66) 



^MAl — a e 



h o T h i h 0 h j 0 
h(jh i Iiq + hj h{)h\ 

0 //()// 1 /?Q T /? [ 



0 0 
0 0 
0 0 



0 0 0 
0 0 0 



/?o T h\ //()/; | 
h^h i /?o T /if 



Similarly, for a first-order AR process, we have 



x(«) = ax(« — 1 ) + e(n) 
The matrix takes the form 



^ari — 



1 — a 1 



a 

2 



a a 2 

1 a 

a 1 „ 



a 



N— 1 






A-2 



a 



N - 3 



3 a-i 

JV-2 



JV-3 



T 



(3.67) 



The inverse of the AR signal AC matrix is a band matrix because the inverse 
of the filter used to generate the AR sequence is an FIR filter. In fact, except 
for edge effects, it is an MA matrix. 

Adjusting the first entry gives for the first-order case 



R 



-l _ 
ARl — 



1 



1 -a 0 

—a 1 + a 2 —a 
0 — a 1-j- gt 



L o o o 



1 T a 2 
—a 



0 

0 

0 

—a 

1 



(3.68) 



This is an important result, which is extended and exploited in subsequent 
sections. 

Since AC matrices often appear in linear systems, it is useful, before 
further exploring their properties, to briefly review linear systems. 



3.5. SOLVING LINEAR EQUATION SYSTEMS 

Let us consider a set of N 0 linear equations represented by the matrix 
equation 

MH = Y (3.69) 

The column vector Y has N 0 elements. The unknown column vector H has 
N elements, and the matrix M has N 0 rows and N columns. Depending on 
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the respective values of N 0 and N, three cases can be distinguished. First, 
when N 0 = N, the system is exactly determined and the solution is 

H — M~ x Y (3.70) 



Second, when N 0 > N, the system is overdetermined because there are more 
equations than unknowns. A typical example is the filtering of a set of N 0 
data x(n) by an FIR filter whose N coefficients must be calculated so as to 
make the output set equal to the given vector Y: 



40) 


0 


x(l) 


40) 


x(N - 1) 


x(N - 2) 


x(N 0 - 1) 


x(N 0 - 2) 



0 

0 

40) 

x(Nq - N) 



ho 

h\ 

^N - 1 



v(0) 

v(D 

y(N 0 - 1) 



(3.71) 



A solution in the LS sense is found by minimizing the scalar J: 



J — (Y — MH)\Y — MH) 



Through derivation with respect to the entries of the vector //. the solution 
is found to be 



H — (M t M)~ l M t Y (3.72) 

Third, when N 0 < N, the system is underdetermined and there are more 
unknowns than equations. The solution is then 

H — 1 Y (3.73) 

The solution of an exactly determined system must be found in all cases. 
The matrix (M'M) is symmetrical, and standard algorithms exist to solve 
equation systems based on such matrices, which are assumed positive defi- 
nite. The Cholesky method uses a triangular factorization of the matrix and 
needs about A 3 / 3 multiplications; the subroutine is given in Annex 3.1. 

Iterative techniques can also be used to solve equation (3.69). The matrix 
M can be decomposed as 

M = D + E 



where D is a diagonal matrix and is is a matrix with zeros on the main 
diagonal. Now 

H — D~ l Y — D~ X EH 



and an iterative procedure is as follows: 
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(3.74) 



Hi = D~ l Y - D~ l EH 0 

H„ +l — D~ l Y - D~ l EH„ 

The decrement after n iterations is 

H n+i - H n = ~(D- l E)" +l D 1 Y 

The procedure may be stopped when the norm of the vector H n+X — H n falls 
below a specified value. 

3.6. EIGENVALUE DECOMPOSITION 

The eigenvalue decomposition of an AC matrix leads to the extraction of the 
basic components of the corresponding signal [10-13] — hence its signifi- 
cance. 

The eigenvalues A,- and eigenvectors V,- of the N x N matrix R are defined 
by 

RVj = XjVi, 0 ^ i < N- 1 (3.75) 

If the matrix R now denotes the AC matrix R xx , it is symmetric for real 
signals and Hermitian for complex signals because 

XV* Vi = (VfRVi)* — XV* Vi (3.76) 

The eigenvalues are the real solutions of the characteristic equation 

det(7? — XI N ) — 0 (3.77) 

The identity matrix In has +1 as single eigenvalue with multiplicity N, and 
the co-identity matrix J N has ±1. 

The relations between the zeros and coefficients of polynomials yield the 
following important results: 



jv— t 



det R = j - [ Xj 


(3.78) 


i=0 




JV— 1 




Nr(0) = No 1 x = Y J Xi 

7=0 


(3.79) 



That is, if the determinant of the matrix is nonzero, each eigenvalue is 
nonzero and the sum of the eigenvalues is equal to N times the signal 
power. Furthermore, since the AC matrix is nonnegative definite, all the 
eigenvalues are nonnegative: 
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X t Js 0, 0 i < N — 1 



(3.80) 



Once the eigenvalues have been found, the eigenvectors are obtained by 
solving equations (3.68). The eigenvectors associated with different eigen- 
values of a symmetric matrix are orthogonal because of the equality 



V-V — — V'RV —^-V-V- 
' J X- 1 J X- 1 1 



(3.81) 



When all the eigenvalues are distinct, the eigenvectors make an orthonormal 
base and the matrix can be diagonalized as 

R = M' AM (3.82) 



with M the N x N orthonormal modal matrix made of the N eigenvectors, 
and A the diagonal matrix of the eigenvalues; when they have a unit norm, 
the eigenvectors are denoted by C7, and: 

M‘ = [U 0 ,U X ,...U N _ { \, M‘ — M~ l 
A = diag(k 0 , A-i, . . . , X N _\ ) 



For example, take a periodic signal x(n) with period N. The AC function is 
also periodic with the same period and is symmetrical. The AC matrix is a 
circulant matrix, in which each row is derived from the preceding one by 
shifting. Now, if |S(k)| 2 denotes the signal power spectrum and T n the 
discrete Fourier transform (DFT) matrix of order N: 



T n = 



1 1 

1 w 



1 

N - 1 



1 W 



N - 1 



W 






w = e~ j2 *' N 



(3.84) 



it can be directly verified that 

RT n = r A ,diag(|S(C)| 2 ) (3.85) 

Due to the periodicity assumed for the AC function, the same is also true for 
the discrete cosine Fourier transform matrix, which is real and defined by 

T cN = \[T n +T* n ] (3.86) 

Thus 

RT cN = 7^diag(|S(k)| 2 ) (3.87) 

and the N column vectors of T cN are the N orthogonal eigenvectors of the 
matrix R. Then 
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(3.88) 



i? = ir fA ,diag(|5(A:)| 2 )r f7 v 

So, it appears that the eigenvalues of the AC matrix of a periodic signal are 
the power spectrum; and the eigenvector matrix is the discrete cosine 
Fourier transform matrix. 

Flowever, the diagonalization of an AC matrix is not always unique. Let 
us assume that the N cisoids in the signal x(ti) have frequencies &>,• which are 
no longer multiples of 2 ir/N: 

N 

x(n) = J2 S ‘ eJnW ‘ (3.89) 

1=1 

The ACF is 



r(p) - jr \Sj\ 2 e Jpo> ‘ 

(=i 



(3.90) 



and the AC matrix can be expressed as 
R = M*dvdg(\S t \ 2 )M 
with 



>1 




J(02 


. . e M- 1)<«2 


,j<t>N 


e KN- l)m N 



(3.91) 



But the column vectors in M* are neither orthogonal nor eigenvectors of R, 
as can be verified. If there are K cisoids with K < N, M becomes a K x N 
rectangular matrix and factorization (3.91) is still valid. But then the signal 
space dimension is restricted to the number of cisoids K, and N — K eigen- 
values are zero. 

The white noise is a particularly simple case because R = a 2 / v and all the 
eigenvalues are equal. If that noise is added to the useful signal, the matrix 
a~I N is added to the AC matrix and all the eigenvalues are increased by a 2 . 

Example 

Consider the sinusoid in white noise 



x(n) — Vl sin(/ 2 &/) + e(n) (3.92) 

The AC function is 

r(p) — cos {poo) + (T 2 8(p ) (3.93) 
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Eigenvalues 



The eigenvalues of the 3x3 AC matrix are 



R = 



>(0) /•( 1 ) K 2) 
Kl) KO) K 1) 
K 2) Kl) KO) 



T 1 — cos 2 n) 

A -2 — T 2 T cos 2 &) 
2 

1 3 = a e 



and the unit norm eigenvectors are 



1 71 = V! 



1 

0 

-1 



Vi = 



(1 + 2cos 2 a>) 1/2 



COStt) 

1 

cos co 



(3.94) 



t/ 3 = 



1 

(2 + 4 cos 2 &>) 1/2 



1 

—2 coso) 

1 



The variations of the eigenvalues with frequency are shown in Figure 3.4. 

Once a set of N orthogonal eigenvectors has been obtained, any signal 
vector X(n) can be expressed as a linear combination of these vectors, which, 
when scaled to have a unit norm, are denoted by [/,■: 

n-\ 

X(n ) = .(«)£/,■ (3.95) 

i= 0 




FIG. 3.4 Variation of eigenvalues with frequency. 
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The coefficients a t {n) are the projection of X(n) on the vectors f/,-. Another 
expression of the AC matrix can then be obtained, assuming real signals: 

N - 1 

R = E[X(n)X‘(n)] = £ £[«?(«)] U, U\ (3.96) 

1=0 

The definition of the eigenvalues yields 

E[aj(n)\ = k t (3.97) 

Equation (3.97) provides an important interpretation of the eigenvalues: 
they can be considered as the powers of the projections of the signal vectors 
on the eigenvectors. The subspace spanned by the eigenvectors correspond- 
ing to nonzero eigenvalues is called the signal subspace. 

The eigenvalue or spectral decomposition is derived from (3.96): 

JV— 1 

R = J2^i U i U ‘i (3-98) 

1=0 

which is just a more explicit form of diagonalization (3.82). It is a funda- 
mental result which shows the actual constitution of the signal and is 
exploited in subsequent sections. For signals in noise, expression (3.98) 
can serve to separate signal subspace and noise subspace. 

Among the eigenparameters the minimum and maximum eigenvalues 
have special properties. 



3.7. EIGENFILTERS 



The maximization of the signal-to-noise ratio (SNR) through FIR filtering 
leads to an eigenvalue problem [14]. 

The output power of an FIR filter is given in terms of the input AC 
matrix and filter coefficients by equation (3.60): 

£[/(«)] = H'RH 

If a white noise with power a 2 e is added to the input signal, the output SNR 
is 



SNR = 



H'RH 
H' Hal 



(3.99) 



It is maximized by the coefficient vector H, which maximizes H'RH , subject 
to the constraint H'H = 1. Using a Lagrange multiplier, one has to max- 
imize H'RH + 1(1 — H'H) with respect to H , and the solution is RH — XH. 
Therefore the optimum filter is the signal AC matrix eigenvector associated 
with the largest eigenvalue, and is called the maximum eigenfilter. Similarly, 
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the minimum eigenfilter gives the smallest output signal power. These filters 
are characterized by their zeros in the complex plane. 

The investigation of the eigenfilter properties begins with the case of 
distinct maximum or minimum eigenvalues; then it will be shown that the 
filter zeros are on the unit circle. 

Let us assume that the smallest eigenvalue k min is zero. The correspond- 
ing eigenvector C/ min is orthogonal to the other eigenvectors, which span the 
signal space. According to the harmonic decomposition of Section 3.11, the 
matrix R is the AC matrix of a set of N — 1 cisoids, and the signal space is 
also spanned by N — 1 vectors V{. 



Therefore £/ min is orthogonal to all the vectors V r and the N — 1 zeros of the 
corresponding filter are e Jm ‘( I ^ i ^ N — 1), and they are on the unit circle 
in the complex plane. 

Now, if A. min is not zero, the above development applies to the matrix 
(R - Wv), which has the same eigenvectors as R , as can be readily ver- 
ified. 

For the maximum eigenvector t/ max corresponding to k max , it is sufficient 
to consider the matrix (A. max /jv — R), which has all the characteristics of an 
AC matrix. Thus the maximum eigenfilter also has its zeros on the unit circle 
in the r-plane as soon as k max is distinct. 

The above properties can be checked for the example in the preceding 
section, which shows, in particular, that the zeros for C/ min are e ±Ja> . 

Next, if the minimum (or maximum) eigenvalue is multiple, for example 
N — K, it means that the dimension of the signal space is K and that of the 
noise space is N — K. The minimum eigenhlters, which are orthogonal to the 
signal space, have K zeros on the unit circle, but the remaining N — 1 — K 
zeros may or may not be on the unit circle. 

We give an example for two simple cases of sinusoidal signals in noise. 

The AC matrix of a single cisoid, with power S 2 , in noise is 



e 



1 sS i ^ N - 1 



e AN-l)u>, 




e 



.2 




s 2 e j(N-l)a>- 

s 2 e j(N-2)a> 



R = 



(3.100) 



g2 e ~j(N-l)co g2 e -j(N-2)a> 




The eigenvalues are 
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Xi = NS 2 + a?, Xj — a 2 , 2 < i < TV 



and the maximum eigenfilter is 
1 



Umax — 



Vn 



i 



-ypV-l)<w 



(3.101) 



The corresponding filter z-transfer function is 



H m {") = 



i 7 s -e iNa 
x/N z-e Ja> 



e -j(N-X)w 



and the iV — 1 roots 



(3.102) 



z,- = 1 ^ i < TV - 1 



are spread on the unit circle, except at the frequency co. H M (z) is the con- 
ventional matched filter for a sine wave in noise. 

Because the minimum eigenvalue is multiple, the unnormalized eigenvec- 






- £ vie* i - 1)a 



v 2 



(3.103) 



v N 



where N — 1 arbitrary scalars v, are introduced. 

Obviously there are N — 1 linearly independent minimum eigenvectors 
which span the noise subspace. The associated filter z-transfer function is 

N 

HJz) - (z - e ja ) J2 Vitf- 2 + z'- 3 e> + ■ ■ • + e Ki ~ 2)w ] (3. 104) 

i=2 



One zero is at the cisoid frequency on the unit circle; the others may or may 
not be on that circle. 

The case of two cisoids, with powers S 2 and S 2 in noise leads to more 
complicated calculations. The correlation matrix 

sf + si + at S , ?e M + s\e jayi : S?e ,w “ 1)<Bl + s|e XJV_1)< ” 2 

R = S?e _M + Sle-** Sj + Sj + or 2 : S 2 t e J(N ~ 2)wi + Sje J(N - 2)o>2 

+ S 2 e -M-1)„ 2 S 2 e -J ( N-2M + S 2 e -AN-2)„ 2 $ 2 + g 2 + ^ 

has eigenvalues [15] 
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*1 =^ + f[S? + S 2 2 ] + 




(S? - Si) 2 + N 2 SlSlF 2 (wi - co 2 ) 



^2 



°? + j[Sf + S?] + 




(Sf - Si f + N 2 SlSlF 2 (wi - co 2 ) 



X,- — a], 3 < z < A 



(3.105) 



F(a>) is the familiar function 



sin(/V<w/2) 
N sin(<u/2) 



(3.106) 



These results, when applied to a sinusoid amplitude A , x(zz) = A sin(«&>), 
yield 





sin(Wai)' 

sincu 



(3.107) 



The extent to which X x and X 2 reflect the powers of the two cisoids 
depends on their respective frequencies, through the function F(co), which 
corresponds to a length- ,V rectangular time wnidow. For N large and fre- 
quencies far apart enough, 

F(co x - co 2 ) ^>0, A, = NSj + a 2 e , X 2 = NS 2 + a 2 e (3. 108) 

and the largest eigenvalues represent the cisoid powers. 

The r-transfer function of the minimum eigenfilters is 

H m (z) = (z - e M )(z - e jar )P(z) (3. 109) 

with P( z) a polynomial of degree less than N — 2. Two zeros are on the unit 
circle at the cisoid frequencies; the other zeros may or may not be on that 
circle. 

To conclude: for a given signal the maximum eigenhlter indicates where 
the power is in the frequency domain, and the zeros of the minimum eigen- 
value filter give the exact frequencies associated with the harmonic decom- 
position of that signal. 

Together, the maximum and minimum eigenfilters constitute a powerful 
tool for signal analysis. However, in practice, the appeal of that technique is 
somewhat moderated by the computation load needed to extract the eigen- 
parameters, which becomes enormous for large matrix dimensions. Savings 
can be obtained by careful exploitation of the properties of AC matrices 
[16]. For example, the persymmetry relation (3.64) yields, for any eigenvec- 
tor V h 
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JftRVj — /.,,/ v Vj — RJ N Vj 



Now, if Xj is a distinct eigenvalue, the vectors V, and J N V, are colinear, 
which means that V i is also an eigenvector of the co-identity matrix J N , 
whose eigenvalues are ±1. Hence the relation 

J N V, = ±Vi (3.110) 

holds. 

The corresponding property of the AC matrix can be stated as follows: 
the eigenvectors associated with distinct eigenvalues are either symmetric or 
skew symmetric; that is, they verify (3.110). 

Iterative techniques help manage the computation load. Before present- 
ing such techniques, we give additional properties of extremal eigenvalues. 



3.8. PROPERTIES OF EXTREMAL EIGENVALUES 

In the design process of an adaptive filter it is sometimes enough to have 
simple evaluations of the extremal eigenvalues A. max and k mjn . A loose bound 
for the maximum eigenvalue of an AC matrix, derived from (3.79), is 

7-max =5 Na 2 x (3.111) 

with a\ the signal power and N x N the matrix dimension. A tighter bound, 
valid for any square matrix R with entries r ir is known from matrix theory 
to be 

JV-l 

max E \ry\ (3.112) 

] i = o 



N - 1 

max^k,! 

' 7=0 

To prove the inequality, single out the entry with largest magnitude in the 
eigenvector K max and bound the elements of the vector RV m ax . 

In matrix theory, k max is called the spectral radius. It serves as a matrix 
norm as well as the right side of (3.1 12). 

The Rayleigh quotient of R is defined by 

V'RV 

R a {V) = -r^ppr, V^O (3.113) 

As shown in the preceding section, 
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^max ^ 



or 



^max ^ 




(3.114) 



^max = max R a ( V) 

The diagonalization of R yields 

R = Tr'diag(A,)/W (3.115) 

It is readily verified that 

R- 1 = M“'diag^M (3.1 16) 

Therefore /„ ml - n is the maximum eigenvalue of R 1 . The condition number of 
R is defined by 

cond(fl) = ||i?|| ||^ _1 || (3.117) 



If the matrix norm ||7?|| is A max , then 

cond(7?) = — (3.118) 

7-min 

The condition number is a matrix parameter which impacts the accuracy 
of the operations, particularly inversion [9]. It is crucial in solving linear 
systems, and it is directly related to some stability conditions in LS adaptive 
filters. 

In adaptive filters, sequences of AC matrices with increasing dimensions 
are sometimes encountered, and it is useful to know how the extremal 
eigenvalues vary with matrix dimensions for a given signal. Let us denote 
by C max V the maximum unit-norm eigenvector of the N x N AC matrix 
R n . The maximum eigenvalue is 

^ max, A 1 — ^max,N^N^max,N (3.119) 

Now, because of the structure of the (N + 1) x ( N + 1) AC matrix, the 
following equation is valid: 

kmax. V — [Cnax : .V> 



K 0) 


Ki) 


r(N - 1) 


r(N) 


Ki) 


KO) 


r(N - 2) 


r(N - 1) 




Rn 






r(N - 1) 


r(N- 2) 


KO) 


Kl) 


r(N) 


r(N - 1) • ■ ■ 


Ki) 


KO) 



(3.120) 



At the dimension N + 1, A. max ^+1 is defined as the maximum of the 
product U r N+i R N+ i U N+l for any unit-norm vector U N+ \. The vector 
obtained by appending a zero to N is such a vector, and the following 
inequality is proven: 
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^-ma x,JV ^ ^■max.A'+l 



( 3 . 121 ) 



Also, considering the minimization procedure, we have 

kmin.iV ^ ^-min.A'+l (3.122) 

When N approaches infinity, k max and A. min approach the maximum and the 
minimum, respectively, of the signal power spectrum, as shown in the next 
section. 



3.9. SIGNAL SPECTRUM AND EIGENVALUES 

According to relation (3.79), the eigenvalue extraction can be viewed as an 
energy decomposition of the signal. In order to make comparisons with the 
spectrum, we choose the following definition for the Fourier transform Y(f) 
of the signal x(n): 

Y(f ) = lim E x(n)e- j2 * fn (3. 123) 

n^OO y/2N + 1 Trf 

The spectrum is the square of the modulus of Y(f): 

S(f) = Y(f)Y(f) = \Y(f)\ 2 (3.124) 

When the summations in the above definition of S(f) are rearranged, the 
correlation function r{p) shows up, and the following expression is obtained: 

OO 

S(f) = E r (P)^' 2ltfP (3-125) 

p =— OO 



Equation (3.125) is appropriate for random signals with statistics that are 
known or that can be measured or estimated. 

Conversely, the spectrum S(f) is a periodic function whose period is the 
reciprocal of the sampling frequency, and the correlation coefficients are the 
coefficients of the Fourier series expansion of S(f): 

f \/2 

r (p) — I S{f)e j27ipJ df (3.126) 

J- 1/2 

In practice, signals are time limited, and often a finite-duration record of N 0 
data representing a single realization of the process is available. Then it is 
sufficient to compute the spectrum at frequencies which are integer multiples 
of 1 /No, since intermediate values can be interpolated, and the DFT with 
appropriate scaling factor 

> 
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i N 0 - 1 

Y(k ) = —= J2 X(ri)e~ i( - 2nmnk (3 . 1 27) 

V ^0 n — o 

is employed to complete that task. The operation is equivalent to making the 
signal periodic with period N 0 ; the corresponding AC function is also per- 
iodic, with the same period, and the eigenvalues of the AC matrix are 
|T(k)| 2 ,0 < k < N 0 - 1. 

Now, the N eigenvalues A.,- of the N x N AC matrix R N and their asso- 
ciated eigenvectors V,- are related by 

X i V*V i =V*R N V i (3.128) 

The right side is the power of the output of the eigenfilter; it can be 
expressed in terms of the frequency response by 

r l/2 

V* Rn Vj — / \H l (f )\ S(f)df (3.129) 

7-1/2 

The left side of (3.115) can be treated similarly, which leads to 

min S(f) ^ Xj ^ max S(f) (3.130) 

-l/2</< 1/2 -l/2s£/=S 1/2 



It is also interesting to relate the eigenvalues of the order N AC matrix to 
the DFT of a set of N data, which is easily obtained and familiar to practi- 
tioners. If we denote the set of N data by the vector X N , the DFT, expressed 
by the matrix T n (3.84), yields the vector Y n : 

Yn — —j= T n X n 



The energy conservation relation is verified by taking the Euclidean norm of 
the complex vector Y n : 



TA* V V* V 

— 1 N 1 N — A i 



N^N 



Or, explicitly, we can write 

N-l N—\ 

^ imi 2 = x>(«)i 2 

k= 0 n = 0 



The covariance matrix of the DFT output is 
E[Y n Y* n \ = ^T n RT n 
The entries of the main diagonal are 

E[\Y{k)\ 2 }=^nR N V k 



(3.131) 



(3.132) 
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with 



y* = [l >e J 2g / N t . . . ; e iVn/mN-Y)] 

From the properties of the eigenvalues, the following inequalities are 
derived: 

Amax 5* max E[\Y(k)\ 2 ] 

Amin > max E[\Y(k)\ 2 ] (3.133) 

Osg k^N-l 

These relations state that the DFT is a filtering operation and the output 
signal power is bounded by the extreme eigenvalues. 

When the data vector length N approaches infinity, the DFT provides the 
exact spectrum, and, due to relations (3.130) and (3.133), the extreme eigen- 
values A min and A max approach the extreme values of the signal spectrum 

[17] . 

3.10. ITERATIVE DETERMINATION OF EXTREMAL 
EIGENPARAMETERS 

The eigenvalues and eigenvectors of an AC matrix can be computed by 
classical algebraic methods [9]. Flowever, the computation load can be enor- 
mous, and it is useful to have simple and efficient methods to derive the 
extremal eigenparameters, particularly if real-time operation is envisaged. 
A first, gradient-type approach is the unit-norm constrained algorithm 

[18] . It is based on minimization or maximization of the output power of a 
filter with coefficient vector H(n), as shown in Figure 3.5, using the eigen- 
filter properties presented in Section 3.7. The output of the unit-norm filter 
is 



e(n) = 



H‘(n)X(n) 

[H l (n)H(n)] l/2 



The gradient of e(n) with respect to H(n) is the vector 



Ve(«) = 



1 



[H\n)H(n)} { ' 2 L 



X(n) — e(n) 



H(n) 



[H , (n)H(n)]^ 2 _ 



(3.134) 



(3.135) 



Now, the power of the sequence e{k) is minimized if the coefficient vector at 
time n + 1 is taken as 



H(n + 1) = H(n) — 8e(n)Ve(n) 



(3.136) 



> 
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x(n+1-N) 



x(n) x(n-l) 




FIG. 3.5 Unit -norm constrained adaptive filter. 



where <5, the adaptation step size, is a positive constant. After normalization, 
the unit-norm filter coefficient vector is 



H(n + 1) 

l|tf(»+l)ll 



with 



1 

\\H{n+ 1)|| 



H(n) - 



Se(n) 

wmm 



i^X(n) — e(n) 



inn) \ 

\\H(n)\\) 



(3.137) 



\\M(n)\\ = [H‘(n)H(n)] 1 ' 2 



In the implementation, the expression contained in the brackets is com- 
puted first and the resulting coefficient vector is then normalized to unit 
norm. In that way there is no roundoff error propagation. The gradient-type 
approach leads to the eigenequation, as can be verified by rewriting equation 
(3.136): 



H(n + 1) = H(n) - 



l|tf(«)ll 



-e\n) . H(n) 



\\H(n)\\ 



\\H(n)\\ 



Taking the expectation of both sides, after convergence, yields 

D H (°°) „ 2, A1 H(oo) 

II 7/(oo) II e n II //(oo)ll 



(3.138) 



(3.139) 



The output signal power is the minimum eigenvalue, and H(oo) is the 
corresponding eigenvector. Changing the sign in equation (3.136) leads to 
the maximum eigenvalue instead. 

The step size 8 controls the adaptation process. Its impact is analyzed 
indepth in the next chapter. 
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Faster convergence can be obtained by minimizing the conventional cost 
function 

n 

J(n) = W n ~ p e 2 (p), 0 « W < 1 (3.140) 

p= i 

using a recursive LS algorithm [19]. The improvement in speed and accuracy 
is paid for by a significant increase in computation load. Furthermore, 
because of approximations made in the derivation, an initial guess for the 
coefficient vector sufficiently close to the exact solution is needed to achieve 
convergence. In contrast, a method based on the conjugate gradient techni- 
que converges for any initial guess in approximately M steps, where M is the 
number of independent eigenvalues of the AC matrix [20]. 

The method assumes that the AC matrix R is known, and it begins with 
an initial guess of the minimum eigenvector // min (0) and with an initial 
direction vector. The minimum eigenvalue is computed as 
C L n(0)^C m in(0), and then successive approximations U mm {k) are developed 
to minimize the cost function U'RU in successive directions, which are R- 
conjugates, until the desired minimum eigenvalue is found. 

The FORTRAN subroutine is given in Annex 3.2. 



3 . 11 . ESTIMATION OF THE AC MATRIX 

The AC matrix can be formed with the estimated values of the AC function. 
The bias and variance of the estimators impact the eigenparameters. The 
bias can be viewed as a modification of the signal. For example, windowing 
effects, as in (3.21), smear the signal spectrum and increase the dimension of 
the signal subspace, giving rise to spurious eigenvalues [21]. The effects of 
the estimator variance can be investigated by considering small random 
perturbations on the elements of the AC matrix. In adaptive filters using 
the AC matrix, explicitly or implicitly as in fast least squares (FLS) algo- 
rithms, random perturbations come from roundoff errors and can affect, 
more or less independently, all the matrix entries. 

Let us assume that the matrix R has all its eigenvalues distinct and is 
affected by a small perturbation matrix A R. The eigenvalues and vectors are 
explicit functions of the matrix elements, and their alteration can be devel- 
oped in series; considering only the first term in the series, the eigenvalue 
equation with unit-norm vectors is 

(R + A R)(t7, + At/,) = (X,- + A X,)(Ui + At/,-), 0 < i ^ N — 1 

(3.141) 

Neglecting the second-order terms and premultiplying by U\ yields 
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A Xj — UjARUj 



(3.142) 



Due to the summing operation in the right side, the perturbation of the 
eigenvalue is very small, if the error matrix elements are i.i.d. random vari- 
ables. 

In order to investigate the eigenvector deviation, we introduce the nor- 
malized error matrix A E, associated with the diagonalization (3.82) of the 
matrix R: 

A E = A 1/2 MAf?M'A“ 1/2 (3.143) 

We can write (3.141), without the second-order terms and taking (3.142) 
into account, 

(R - VaOA U, = ( U t Uj - 1^) A R U i (3.144) 

After some algebraic manipulations, we get 

A U, = J2 AE ( k . o U k (3. 145) 

IS! k i ~ k k 

k* 1 



where the A E(k, i) are the elements of the normalized error matrix. 

Clearly, the deviation of the unit-norm eigenvectors Uj depends on the 
spread of the eigenvalues, and large deviations can be expected to affect 
eigenvectors corresponding to close eigenvalues [22], 

Overall, the bias of the AC function estimator affects the AC matrix 
eigenvalues, and the variance of errors on the AC matrix elements affects 
the eigenvector directions. 

In recursive algorithms, the following estimation appears: 

n 

R N (n) = W n - p X(n)X\n ) (3.146) 

p= i 



where W is a weighting factor (0 <^C W ^ 1 ) and X(n) is the vector of the N 
most recent data. In explicit form, assuming A(0) = 0, we can write 



R N (n) = 



t W n ~‘Ai) 

1=1 

E tC-'W- 1)40 

i—2 



E W n ~'x(i)x(i - 1) 
1=2 

E w n ~‘Ai- 1) 

i=2 



- N + 1 ) 

i=N 



n 



W n ~ l x(i)x(i — N + 1 ) 

_ i=N 



jr W n ~‘x 2 (i -N+l) 

i=N 



(3.147) 
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The matrix is symmetric. For large n it is almost doubly symmetric. Its 
expectation is 






( 1 - 1F>(0) (1 — ^"“'irll) 

(1 - IF" _1 K1) (1 - W'-'/i 0) 

(1 - W n ~' v+, )r(N- I) 



For large n 

WnV)] » 



(1 - W) n ~ N ~ l r{N- 1 ) 



(1 _ w n ~ N+1 )r( 0 ) 

(3.148) 



(3.149) 



In these conditions, the eigenvectors of R N (n ) are those of R. and the eigen- 
values are multiplied by ( 1 — W )~ 1 . 

Example 



x(n) = sin^n— J, n > 0 
x(n ) — 0, n 0 



The eigenvalues of the 8x8 AC matrix can be found from (3.105) in which 

e _ e _ 1 _ 77 

m — *2 — 2’ "i ~ w 2 — 2 

so that the term in the square root vanishes. Expression (3.107) can be used 
as well, with A — 1: 

Xi = X 2 = 2, = ■ ■ ■ = A 8 = 0 

The eigenvalues of the matrix R (n) 

R\n ) = „ Rn ° 1) , W = 0.95 

2 £ (T”-A 2 (/) 

Z=1 

are shown in Figure 3.6 for the first values of n. They approach the theore- 
tical values as n increases. 
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2 



Eigenvalues 




3.12. EIGEN (KL) TRANSFORM AND 
APPROXIMATIONS 

The projections of a signal vector X on the eigenvectors of the AC matrix 
form a vector 

[u\ = M , X (3.150) 

where M is the N x N orthonormal modal matrix defined in Section 3.6. 
The transform is unitary (M'M — In) and called the Karhunen-Loeve (KL) 
transform. It is optimal for the class of all signals having the same second- 
order statistics [23]. Optimality means the efficiency of a transform in 
achieving data compression: the KL transform provides the optimum sets 
of data to represent signal vectors within a specified mean square error. For 
example, if M out of the N eigenvalues are zero or negligible, the N element 
data vectors can be represented by N — M numbers only. 

To prove that property we assume that the elements of the vector X are N 
centered random variables and look for the unitary transform / which best 
compresses the N elements of X into M(M < N) elements out of the N 
elements y, of the vector Y given by 

Y — TX 

The mean square error is 
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N 

mse = £ E ^ 

i=M + 1 

If the new vectors of T are designated by Vjj then 

N 

MSE = J2 KiE[XX']V n 

i=M + 1 

The minimization of the above expression under the contraint of unit norm 
vectors, using Lagrange multipliers, leads to: 

E[XX‘]V n = a, V Ti , M+ 1 i ^ N 

The minimum is obtained if the scalars a,- are the N — M smallest eigenva- 
lues of the matrix EIXX 1 ] and V T j the corresponding unit norm eigenvectors. 
The minimum mean square error is 

(MSE) min = f] X ‘ 

i=M + 1 



and, in fact, referring to Section 3.6, it is the amount of signal energy which 
is lost in the compression process. 

However, compared with other unitary transforms like the DFT, the KL 
transform suffers from several drawbacks in practice. First, it has to be 
adjusted when the signal second-order statistics change. Second, as seen in 
the preceding sections, it requires a computation load proportional to N 2 . 
Therefore it is helpful to find approximations which are sufficiently close for 
some signal classes and amenable to easy calculation through fast algo- 
rithms. Such approximations can be found for the first-ordr AR signal. 

Because of the dual diagonalization relation 

R~ 1 =M t A~ l M (3.151) 



the KL transform coefficients can be found from the inverse AC matrix as 
well. For the first-order unity-variance AR signal, the AC matrix is given by 
(3.67). The inverse (3.68) is a tridiagonal matrix, and the elements of the KL 
transform for N even are [24] 



m kn = c n sin 



(Q n 



k — 



N + 1 




(3.152) 



where c„ are normalization constants and a>„ are the positive roots of 



tan(An>) = — 



(1 — a 2 ) sin co 
cos co — 2a + a 2 cos co 



(3.153) 



The eigenvalues of R are 
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1 ^ i < iV 



(3.154) 



k,- = 



1 -ct 



(1 — 2a cos co j + a 2 ) 1/2 



Now, the elements of the KL transform of a data vector are 

N 

a k = c„x(«) sin 

77=1 



k - 



N+\ 



7T 



(3.155) 



Due to the nonharmonicity of sine terms, a fast algorithm is unavailable in 
calculating the above expressions, and N 2 computations are required. 
However, if R _1 is replaced by 



R' = 



1 



1 "|~ a — a 

— a 1 4- cc 

0 —a 



0 ■■■ 0 

—a ■ ■ ■ 0 

1 + a 1 ■■■ 0 



0 



0 



: ' . —a 

0 — a 1 4“ a 



(3.156) 



where R' differs by just the first and last entries in the main diagonal, the 
elements of the modal matrix become 



m kn 



-sin 



kmc 



N + 1 \N + 1 
and the eigenvalues are 



X- — 1 — 2 ^cos 

1 + a 2 



ITT 

tv+T 



i= 



The elements of the corresponding transform of a data vector are 



a k 



N- 



- x(n) sin 

1 n= 1 



nkn 
N+ 1 



(3.157) 



(3.158) 



(3.159) 



This defines the discrete sine transform (DST), which can be implemented 
via a fast Fourier transform (FFT) algorithm. 

Finally, for an order 1 AR signal, the DST is an efficient approximation 
of the KL tranform. 

Another approximation is the discrete cosine transform (DCT), defined 
as 



/? N 

«o = 

n= 1 



> 
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(3.160) 



„ (2n-l)ht , , Ar , 

aj! — — 2_^ x ( n ) cos ~ , 1 ^ k ^ N — 1 



i=i 



2N 



It can be extended to two dimensions and is widely used in image processing 
[25], 



3.13. SUMMARY 

Estimating the ACF is often a preliminary step in signal analysis. After 
definition and basic properties have been introduced, efficient estimation 
techniques have been compared. 

The AC matrix is behind adaptive filtering operations, and it is essential 
to be familiar with its major characteristics, which have been presented and 
illustrated by several simple examples. The eigenvalue decomposition has a 
profound meaning, because it leads to distinguishing between the signal or 
source space and the noise space, and to extracting the basic components. 
The filtering aspects help to understand and assess the main properties of 
eigenvalues and vectors. The extremal eigenparameters are especially crucial 
not only for the theory but also because they control adaptive filter perfor- 
mance and because they can provide superresolution analysis techniques. 

Perturbations of the matrix elements, caused by bias and variance in the 
estimation process, affect the processing performance and particularly the 
operation of FLS algorithms. It has been shown that the bias can affect the 
eigenvalues and the variance causes deviations of eigenvectors. The KL 
transform is an illustrative application of the theoretical results. 



EXERCISES 

1. Use the estimators r x (p) and r 2 (p) to calculate the ACF of the sequence 

x(n) — sin^;;^, 0 ^ n ^ 15 

How are the deviations from theoretical values affected by the signal 
frequency? 

2. For the symmetric matrix 



1.1 


-0.6 


0.2 


-0.6 


1.0 


-0.4 


0.2 


-0.4 


0.6 



'j i (&\ 

calculate R and R and the first element / q 0 of the main diagonal of 
R 4 . Compare the ratio ^oo / r oo w dh t ^ ie l ar g est eigenvalue iV max . 
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Show that the following approximation is valid for a symmetric 
matrix R and N sufficiently large: 



R 



7V+1 



R 



1.0 


0.7 


0.0 


0.7 


1.0 


0.7 


0.0 


0.7 


1.0 



1.00 


0.65 


0.10 


0.65 


1.00 


0.65 


0.10 


0.65 


1.00 



This expression can be used for the numerical calculation of the extre- 
mal eigenvalues. 

3. For the AC matrix 



R = 



calculate its eigenvalues and eigenvectors and check the properties 
given in Section 3.6. Verify the spectral decomposition (3.98). 

4. Find the frequency and amplitude of the sinusoid contained in the 
signal with AC matrix 



R = 



What is the noise power? Check the results with the curves in Figure 
3.4. 

5. Find the spectral decomposition of the matrix 



R = 



What is the dimension of the signal space? Calculate the projections of 
the vectors 

X‘(n) = [cos (n ^ , cos[(« - 1) , cos[(« -2)^], 

xcos[(„-3)*]], n — 0, 1,2, 3 

on the eigenvectors. 

6. Consider the order 2 AR signal 

x(n) — 0.9 x(n — 1) — 0.5 x(n — 2) + e(n) 

with £[e 2 («)] = a 2 = 1. Calculate its ACF and give its 3 x 3 AC matrix 
R 3 . Find the minimum eigenvalue and eigenvector. Give the corre- 



1.0 


0.7 


0.0 


-0.7 


0.7 


1.0 


0.7 


0.0 


0.0 


0.7 


1.0 


0.7 


-0.7 


0.0 


0.7 


1.0 
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sponding harmonic decomposition of the signal and compare with the 
spectrum. 

Calculate the 4x4 matrix and its inverse R 4 1 . Comment on the 
results. 

7. Give expressions to calculate the DST (3.159) and the DCT by a 
standard DFT. Estimate the computational complexity for N — 2 P . 



ANNEX 3.1 FORTRAN SUBROUTINE TO SOLVE A LINEAR 
SYSTEM WITH SYMMETRICAL MATRIX 

SUBROUTINE CHOL ( N , A , X , B ) 

C 

C SOLVES THE SYSTEM [A]X=B 

C A : SYMMETRIC COVARIANCE MATRIX (N*N) 

C N : SYSTEM ORDER (N > 2) 

C X : SOLUTION VECTOR 

C B : RIGHT SIDE VECTOR 

DIMENSION A (20, 20) , X ( 1 ) ,B(1) 

A ( 2 , 1 ) = A (2,1)/A(1,1) 

A ( 2 , 2 ) =A (2,2) -A (2,1) *A (1,1) *A (2,1) 

D040I=3 ,N 

A ( I , 1 ) =A (1,1) /A (1,1) 

D020 J=2 ,1-1 
S=A ( I , J) 

D010K=1 , J-l 

10 S=S-A(I,K) *A(K,K) *A( J,K) 

20 A ( I , J) =S/A ( J, J) 

S=A (1,1) 

D030K=1 ,1-1 

30 S=S-A(I,K)*A(K,K)*A(I,K) 

40 A ( I , I ) =S 
X ( 1 ) =B ( 1 ) 

D060I=2 ,N 
S=B ( I ) 

D050 J=1 ,1-1 
50 S=S-A( I , J) *X ( J) 

60 X(I)=S 

X (N ) =X ( N ) /A(N,N) 

D080K=1 ,N-1 
I=N-K 

S=X ( I ) /A (1,1) 

D070J=I+1,N 
70 S=S-A( J,I) *X( J) 
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80 X ( I ) =S 
RETURN 
END 
C 



ANNEX 3.2 FORTRAN SUBROUTINE TO COMPUTE 
THE EIGENVECTOR CORRESPONDING 
TO THE MINIMUM EIGENVALUE BY THE 
CONJUGATE GRADIENT METHOD [20] 

(' Courtesy of Tapan K. Sarkar, Department of 
Electrical Engineering, Syracuse University, 

Syracuse, N. Y. 13244-1240 ) 

SUBROUTINE GMEVCG(N, X, A, B , U, SML , W, M) 

C 

C THIS SUBROUTINE IS USED FOR ITERATIVELY FINDING THE 

C EIGENVECTOR CORRESPONDING TO THE MINIMUM EIGENVALUE 

C OF A GENERALIZED EIGENSYSTEM AX = UBX . 

C 

C A - INPUT REAL SYMMETRIC MATRIX OF ORDER N, WHOSE 

C MINIMUM EIGENVALUE AND THE CORRESPONDING 

C EIGENVECTOR ARE TO BE COMPUTED. 

C B - INPUT REAL POSITIVE DEFINITE MATRIX OF ORDER N. 

C N - INPUT ORDER OF THE MATRIX A. 

C X - OUTPUT EIGENVECTOR OF LENGTH N CORRE SPONDING TO 

C THE MINIMUM EIGENVALUE AND ALSO PUT INPUT 

C INITIAL GUESS IN IT. 

C U - OUTPUT MINIMUM EIGENVALUE . 

C SML - INPUT UPPER BOUND OF THE MINIMUM EIGENVALUE . 

C W - INPUT ARBITRARY VECTOR OF LENGTH N. 

CM - OUTPUT NUMBER OF ITERATIONS. 

C 

LOGICAL AAEZ , BBEZ 

REAL A(N,N) , B (N,N) , X(N) , P(5), R(5),W(N), AP ( 5 ) , 

* BP ( 5 ) , AX ( 5 ) , BX ( 5 ) 

NU = 0 
M = 0 
U1 = 0.0 

1 DO 20 1=1, N 

BX ( I ) =0.0 
DO 10 J=1 , N 

BX ( I ) = BX ( I ) + B ( I , J ) *X ( J ) 
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10 CONTINUE 
20 CONTINUE 
XBX = 0.0 
DO 30 1=1, N 

XBX = XBX + BX(I) *X(I) 

30 CONTINUE 

XBX = SQRT(XBX) 

DO 40 1 = 1, N 

X(I) = X(I) /XBX 
40 CONTINUE 
DO 60 1 = 1, N 
AX ( I ) =0.0 
DO 50 J=1 , N 

AX ( I ) = AX ( I ) + A ( I , J ) *X ( J ) 
50 CONTINUE 
60 CONTINUE 
U = 0.0 
DO 70 1 = 1, N 

U = U + AX (I) *X(I) 

70 CONTINUE 
DO 80 1 = 1, N 

R ( I ) = U*BX ( I ) - AX(I) 

P(I) = R(I) 

80 CONTINUE 
2 DO 100 1 = 1, N 

AP ( I ) =0.0 
DO 90 J=1 , N 

AP ( I ) = AP ( I ) + A ( I , J ) *P ( J) 
90 CONTINUE 
100 CONTINUE 

DO 120 1 = 1, N 
BP ( I ) =0.0 
DO 110 J=1,N 

BP ( I ) = BP ( I ) + B ( I , J ) *P ( J) 
110 CONTINUE 
120 CONTINUE 
PA = 0 . 0 
PB = 0.0 
PC = 0.0 
PD = 0.0 
DO 130 1 = 1, N 

PA = PA + AP ( I ) *X ( I ) 

PB = PB + AP (I) *P (I) 

PC = PC + BP ( I ) *X ( I ) 

PD = PD + BP (I) *P (I) 
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130 CONTINUE 

AA = PB *PC - PA*PD 
BB = PB - U*PD 
CC = PA - U*PC 

AAE Z = AB S ( AA ) .LE.l.OE-75 
BBEZ = ABS(BB) .LE. l.OE-75 
IF (AAEZ .AND. BBEZ ) GO TO 12 
IF (AAEZ ) GO TO 11 
DD = -BB + SQRT ( BB*BB-4 . 0*AA*CC ) 
T = DD/(2.0*AA) 

GO TO 15 

11 T = -CC/BB 
GO TO 15 

12 T = 0.0 

15 DO 140 1 = 1, N 

X(I) = X(I) + T*P ( I ) 

140 CONTINUE 

DO 160 1 = 1, N 
BX ( I ) =0.0 
DO 150 J=1 , N 

BX ( I ) = BX ( I ) + B ( I , J ) *X ( J ) 
150 CONTINUE 
160 CONTINUE 
XBX = 0.0 
DO 170 1 = 1, N 

XBX = XBX + BX(I) *X(I) 

170 CONTINUE 

XBX = SQRT ( XBX ) 

DO 180 1 = 1, N 

X(I) = X ( I ) /XBX 
180 CONTINUE 

DO 200 1 = 1, N 
AX ( I ) =0.0 
DO 190 J=1 , N 

AX (I) = AX ( I ) + A ( I , J ) *X ( J ) 
190 CONTINUE 
200 CONTINUE 
U = 0.0 
DO 210 1 = 1, N 

U = U + AX ( I ) *X ( I ) 

210 CONTINUE 

AI = ABS (U1 - U) 

AJ = ABS (U) *l.OE-03 
AK = AI - AJ 

IF (AK .LT. 0.0) GOTO 3 
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DO 220 1=1, N 

R ( I ) = U*BX(I) - AX(I) 

220 CONTINUE 
QN = 0 . 0 
DO 230 1=1, N 

QN = QN + R(I) *AP ( I ) 

230 CONTINUE 
Q = -QN/PB 
DO 240 1=1, N 

P(I) = R(I) + Q*P ( I ) 

240 CONTINUE 
M = M + 1 
U1 = U 

C WRITE (3, 9998) M 

9998 FORMAT (/IX, 3HM =, 13) 

C WRITE (3,9997) 

9997 FORMAT (/2H, U/) 

C WRITE (3, 9996) U 

9996 FORMAT (IX, E 14. 6) 

C WRITE (3, 9995) 

9995 FORMAT (/5H X( I)/) 

C WRITE (3, 9994) X 

9994 FORMAT ( IX, F11.6) 

GO TO 2 

3 CONTINUE 

IF (U .LT. SML ) RETURN 
NU = NU + 1 
CX = 0.0 
DO 250 1=1, N 

CX = CX + W(I) *BX ( I ) 

250 CONTINUE 

CX = CX/XBX 
DO 260 1 = 1, N 

W(I) = W(I) - CX*X(I) 

X(I) = W(I) 

260 CONTINUE 

IF (NU .GT. N) GO TO 4 
GO TO 1 

4 WRITE (3, 9999) 

9999 FORMAT ( 2 8H NO EIGENVALUE LESS THAN SML ) 
STOP 

END 
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4 

Gradient Adaptive Filters 



The adaptive filters based on gradient techniques make a class which is 
highly appreciated in engineering for its simplicity, flexibility, and robust- 
ness. Moreover, they are easy to design, and their performance is well char- 
acterized. By far, it is the most widely used class in all technical fields, 
particularly in communications and control [1, 2], 

Gradient techniques can be applied to any structure and provide simple 
equations. However, because of the looped structure, the exact analysis of 
the filters obtained may be extremely difficult, and it is generally carried out 
under restrictive hypotheses not verified in practice [3, 4]. However, simpli- 
fied approximate investigations provide sufficient results in the vast majority 
of applications. 

The emphasis is on engineering aspects in this chapter. Our purpose is to 
present the results and information necessary to design an adaptive filter and 
build it successfully, taking into account the variety of options which make 
the approach flexible. 



4.1. THE GRADIENT— LMS ALGORITHM 

The diagram of the gradient adaptive filter is shown in Figure 4.1. The error 
sequence e(n) is obtained by subtracting from the reference signal y(n) the 
filtered sequence y(ri). The coefficients C,(«), 0 < i ^ N — 1, are updated by 
the equation 

Ci(n + 1) = c t (n) - S + ^ e(n + 1) (4. 1 ) 

oCi(n) 

> 
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FIG. 4.1 Principle of a gradient adaptive filter. 



The products [3 e{n + l)/3c, •(»)]£■(« + 1) are the elements of the vector V G , 
which is the gradient of the function \e 2 (n + 1). The scalar 8 is the adaptation 
step. In the mean, the operation corresponds to minimizing the error power, 
hence the denomination least means squares (LMS) for the algorithm. 

The adaptive filter can have any structure. However, the most straight- 
forward and most widely used is the transversal or FIR structure, for which 
the error gradient is just the input data vector. 

The equations of the gradient adaptive transversal filter are 

e(n + 1) = y(n + 1) - H‘(n)X(n + 1) (4.2) 

and 

H(n + 1) = H(n) + 8X(n+ 1 )e(n + 1 ) (4.3) 

where H\n) is the transpose of the coefficient vector and X(n +1) is the 
vector of the N most recent input data. 

The implementation is shown in Figure 4.2. It closely follows the imple- 
mentation of the fixed FIR filter, a multiplier accumulator circuit being 
added to produce the time-varying coefficients. Clearly, 2N + 1 multiplica- 
tions are needed, as well as 2N additions and 2N active memories. 

Once the number of coefficients N has been chosen, the only filter para- 
meter to be adjusted is the adaptation step 8. 

In view of the looped configuration, our first consideration is stability. 

4.2. STABILITY CONDITION AND SPECIFICATIONS 

The error sequence calculated by equation (4.2) is called “a priori,” because 
it employs the coefficients before updating. The “a posteriori” error is 
defined as 
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FIG. 4.2 Gradient adaptive transversal filter. 



e(n + 1) = y(n + 1 ) - H‘(n + 1 )X(n + 1 ) (4.4) 

and it can be computed after (4.2) and (4.3) have been completed. Now, 
from (4.2) and (4.3), (4.4) can be written as 

e(n + 1 ) = e(n + 1 )[1 — SX‘(n + 1 )X(n + 1 )] (4.5) 



The system can be considered stable if the expectation of the a posteriori 
error magnitude is smaller than that of the a priori error, which is logical 
since more information is incorporated in s(n + 1). If the error e(n + 1) is 
assumed to be independent of the N most recent input data, which is 
approximately true after convergence, the stability condition is 



|1 -&E[X\n+ 1 )X{n+ 1)]| < 1 



which yields 



0 < S < 



2 

Na^ 



(4-6) 



(4.7) 



where the input signal power a], is generally known or easy to estimate. 

The stability condition (4.7) is simple and easy to use. However, in prac- 
tice, to account for the hypotheses made in the derivation, it is wise to take 
some margin. For example, a detailed analysis for Gaussian signals shows 
that stability is guaranteed if [5, 6] 



0 < S < - 



1 2 



3 Ncr~ 



(4.8) 



> 
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So, a margin factor of a few units is recommended when using condition 
(4.7). Once the stability is achieved, the final determination of the step 5 in 
the allowed range is based on performance, compared to specifications. 

The two main specifications for gradient adaptive filtering are the system 
gain and the time constant. The system gain G 2 S can be defined as the 
reference to error signal power ratio: 



G 2 _ E[y 2 (n)\ 
S E[e\n)\ 



(4.9) 



For example, in adaptive prediction, G s is the prediction gain. The specifi- 
cation is given as a lower bound for the gain, and the adaptation step and 
the computation accuracy must be chosen accordingly. 

The speed of adaptation is controlled by a time constant specification r e , 
generally imposed on the error sequence. The filter time constant r can be 
taken as an effective initial time constant obtained by fitting the sequence 
E[e 2 (n)\ to an exponential for n — 0 and n — 1, which yields 

(£[<2 2 (0)] - E[e 2 (oo)])e~ 2/T = E[e 2 ( 1 )] - F[e 2 (oo)] (4.10) 



Since r is related to the adaptation step 8, as shown in the following 
sections, imposing an upper limit r e puts a constraint on <5. Indeed the 
gain and speed specifications must be compatible and lead to a nonempty 
range of values for 8; otherwise another type of algorithm, like least squares, 
must be relied upon. 

First, the relation between adaptation step and residual error is investi- 
gated. 



4.3. RESIDUAL ERROR 

The gradient adaptive filter equations (4.2) and (4.3) yield 

H(n +l) = [I N - 8X(n + 1 )X\n + 1 )\H(n) + 8X(n + \)y(n +1) (4.11) 

When the time index n approaches infinity, the coefficients reach their 
steady-state values and the average of //(« + 1 ) becomes equal to the aver- 
age of H(n). Hence, assuming independence between coefficient variations 



and input data vectors, we get 
E[H(oc)\ = R ~ 1 r yx = // opt 


(4.12) 


Using the notation of Section 1.4, we write 
R — E[X{n)X’ {ri)\, r yx — E[X(n + 1 )y(n + 1 )] 


(4.13) 
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Therefore the gradient algorithm provides the optimal coefficient set H opl 
after convergence and in the mean. The vector r yx is the cross-correlation 
between the reference and input signals. 

The minimum output error power E mm can also be expressed as a func- 
tion of the signals and their cross-correlation. 

For the set of coefficients H(n), the mean square output error E(n) is 



E(n) = E[{ y(n) - H'(n)X(n)f] 


(4.14) 


Now, setting the coefficients to their optimal values 


gives 


fi'min = E[y 2 (n)\ - 


(4.15) 


or 




E mm = E[y 2 (n)\ - H‘ opi r yx 


(4.16) 


or 




Emm = E[v 2 (n)\ - r‘ yx R-\- yx 


(4.17) 


In these equations the filter order N appears as the dimension of the AC 
matrix R and of the cross-correlation vector r yx . 

For fixed coefficients Ef(n ) the mean square error (MSE) E(n) can be 
rewritten as a deviation from the minimum: 


E(n) = E lmn + [H opt - H(n)]'R[H opl - H{n )] 


(4.18) 


The input data AC matrix R can be diagonalized as 




R = M'diag(A. ; )M, M'M = I N 


(4.19) 


where, as shown in the preceding chapter, A,(0 < / ^ N — 1) are the eigen- 
values and M the modal unitary matrix. 

Letting 


[«(«)] = M[H opt - H(n )] 


(4.20) 


be the coefficient difference vector in the transformed space, we obtain the 
concise form of (4.18) 


E(n) = E m m + [a(«)]'diag(l, ■)[«(«)] 


(4.21) 


Completing the products, we have 




N- 1 

E(n) - E m m + X! ^(n) 

7=0 


(4.22) 


If A denotes the column vector of the eigenvalues X h 
column vector with elements a 2 (n), then 


and [a 2 (n) ] denotes the 
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(4.24) 



(4.25) 



(4.26) 



E(n) = E mm + A'[or(*)] (4.23) 

The analysis of the gradient algorithm is carried out by following the 
evolution of the vector [«(«)] according to the recursion 

[a(n + 1)] = [»(«)] — SMX(n + 1 )e(n + 1) 

The corresponding covariance matrix is 

[a(n + 1 )][a(n + 1)]' = [a{n)][a(n)]' - 2 8MX(n + 1 )e(n + 1 )[a(n)] 1 

+ S 2 e 2 (n + \)MX(n + \)X\n + 1 )M' 

The definition of e(n + 1) yields 

e(n + 1) = y(n + 1) - H r opt X(n + 1) + X'(n + 1 )M'[a(n)\ 

Equations (4.25) and (4.26) determine the evolution of the system. In order 
to get useful results, we make simplifying hypotheses, particularly about 
e\n) [7]. 

It is assumed that the following variables are independent: 

The error sequence when the filter coefficients are optimal 
The data vector X(n + 1) 

The coefficient deviations H(n) — H opl 

Thus 

E{[y(n + 1) - H' opt X(n + 1 )\X‘(n + 1 )M , [a(n)]} = 0 (4.27) 

Although not rigorously verified, the above assumptions are reasonable 
approximations, because the coefficient deviations and optimum output 
error are noiselike sequences and the objective of the filter is to make 
them uncorrelated with the N most recent input data. Anyway, the most 
convincing argument in favor is that the results derived are in good agree- 
ment with experiments. 

Now, taking the expectation of both sides of (4.25), yields 

E{[a{n + 1 )][a(n + 1 )]'} = [I N - 2 S diag(k ; )]£’{[o'(«)][a:(n)]' l 

, , (4.28) 

+ S-E[e~(n + 1 )] diaga,) 

For varying coefficients, under the above independence hypotheses, expres- 
sion (4.23) becomes 



E[e 2 (n + 1)] = E min + A 'E[a 2 (n)\ 



(4.29) 



Considering the main diagonals of the matrices, and using vector nota- 
tion and expression (4.29) for the error power, we derive the equation 



E[a\n + 1)] = [I N - 2 diag(k ; ) + 5 2 AA']£[« 2 («)] + S 2 E mm A 



(4.30) 
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A sufficient condition for convergence is that the sum of the absolute 
values of the elements of any row in the matrix multiplying the vector 
E[u 2 {n)\ be less than unity: 

0 < 1 - 2 S),,- + <1, 0 < li < N — 1 (4.31) 



from which we obtain the stability condition 



0 < S < 



2 

i 

£ h 

j = 0 



2 

No 2 



which is the condition already found in Section 4.2, through a different 
approach. 

Once the stability conditions are fulfilled, recursion (4.28) yields, as 
n — > oo, 

£{[a(°c)][a(oo)]'} = ^ E{cc)I n (4.32) 



Due to the definition of the vector [«(«)], equation (4.32) also applies to the 
coefficient deviations themselves. Thus the coefficient deviations, after con- 
vergence, are statistically independent and have the same power. 

Now, combining (4.32) and (4.29) yields the residual error E R . 



E(oo) = E r = 



1 - (6/2 )Na~ 



(4.33) 



Finally, the gradient algorithm produces an excess output MSE related to 
the adaptation step. Indeed, when <5 approaches the stability limit, the out- 
put error power approaches infinity. The ratio of the steady-state MSE to 
the minimum attainable MSE is called the final misadjustment M ad j: 



; '^adj — 




1 

1 - (S/2 )No 2 



(4.34) 



In practical realizations, due to the margin generally taken for the adap- 
tation step size, the approximation 



\+-Na 2 x 



(4.35) 



is often valid, and the excess output MSE is approximately proportional to 
the step size. In fact, it can be viewed as a gradient noise, due to the 
approximation of the true cost function gradient by an instantaneous value. 
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4.4. LEARNING CURVE AND TIME CONSTANT 



The adaptive filter starts from an initial state, which often corresponds to 
zero coefficients. From there, its evolution is controlled by the input and 
reference signals, and it is possible to define learning curves by parameter 
averaging. 

The evolution of the coefficient difference vector in the transformed space 
is given by quation (4.24). Substituting equation (4.26) into this equation 
and taking the expectation yields, under the hypotheses of Section 4.3, 

E[a(n + 1)] = [I N - 8 diag(k,)]f?[a(n)] (4.36) 

Substituting into equation (4.29) and iterating from the time origin leads to 

E(n) - E mm = A' diag(l - 8X i ) ln E[a 1 m (4.37) 

The same results can also be derived from equation (4.30) after some sim- 
plification, assuming the step size 8 is small. 

Clearly, the evolution of the coefficients and the output MSE depends on 
the input signal matrix eigenvalues, which provide as many different modes. 
In the long run, it is the smallest eigenvalue which controls the convergence. 

The filter time constant r e obtained from an exponential fitting to the 
output rms error is obtained by applying definition (4.10) and neglecting the 
residual error: 



E( 0)e~ 2/z ' = A' diag(l - ^,)F[a 2 (0)] 



(4.38) 



We can also obtain it approximately by applying (4.29) at the time origin: 



A'F[a 2 (0)] 




= A' diag(l - 2(5A,)£[cr(0)] 



(4.39) 



Flence 



£ k,E{alm 

__ 1 i=0 

Ze ~$N- 1 

E k,F{a 2 (0)}k ; - 



If the eigenvalues are not too dispersed, we have 



r e 



N 

TV— 1 

5 E *■! 

i= 0 



1 

8a 2 x 



(4.40) 



(4.41) 



The filter time constant is proportional to the inverse of the adaptation 
step size and of the input signal power. Therefore, an estimation of the 
signal power is needed to adjust the adaptation speed. Moreover, if the 
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signal is nonstationary, the power estimation must be carried out in real 
time to reach a high level of performance. 

A limit on the adaptation speed is imposed by the stability condition 
(4.7). 

From equation (4.30), it appears that the rows of the square matrix are 
quadratic functions of the adaptation step and all take their minimum norm 
for 



N - 1 

£*/ 

(=0 



1 

Nnl 



(4.42) 



which corresponds to the fastest convergence. Therefore the smallest time 
constant is 



r ■ — N 



(4.43) 



In these conditions, if the eigenvalues are approximately equal to the 
signal power, which occurs for noiselike signals in certain modeling applica- 
tions, the learning curve, taken as the output MSE function, is obtained 
from (4.36) by 



E(n) -E r = (E( 0) - E r ) 




(4.44) 



For zero initial values of the coefficients, is(0) is just the reference signal 
power. 

Overall, the three expressions (4.7), (4.33), and (4.41) give the basic infor- 
mation to choose the adaptation step <5 and evaluate a transversal gradient 
adaptive filter. They are sufficient in many practical cases. 

Example 

Consider the second-order adaptive FIR prediction filter in Figure 4.3, with 
equations 

e(n + 1) = x(n + 1) — ci\{n)x(n) — a 2 (n)x(n — 1) 



a\(n + 1) 
a 2 (n + 1) 



(«) , 0 

a 2 (n) 



x(n) 

(n - 1) 



e(n + 1) 



(4.45) 



The input signal is a sinusoid in noise: 

x{n) — sin^^j + b(n) (4.46) 

The noise b(n) has power <r| = 5 x 10 -5 . The input signal power is er| = 0.5. 
The step size 8 is 0.05. Starting from zero-valued coefficients, the evolution 
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xCn) * 



e(n) 




FIG. 4.3 Second-order prediction filter. 



of the output error, the two coefficients, and the corresponding zeros in the 
complex plane are shown in Figure 4.4. Clearly the output error time con- 
stant is in reasonably good agreement with estimation (4.41). 

In the filter design process, the next step is the estimation of the coeffi- 
cient and internal data word lengths needed to meet the adaptive filter 
specifications. 



4.5. WORD-LENGTH LIMITATIONS 

Word-length limitations introduce roundoff error sources, which degrade 
the filter performance. The roundoff process generally takes place at the 
output of the multipliers, as represented by the quantizers Q in Figure 4.5. 

In roundoff noise analysis a number of simplifying hypotheses are gen- 
erally made concerning the source statistics. The errors are identically dis- 
tributed and independent; with rounding, the distribution law is uniform in 
the interval [—q/2, q/ 2], where q is the quantization step size, the power is 
q 1 /\2, and the spectrum is flat. 

Concerning the adaptive transversal filter, there are two different cate- 
gories of roundoff errors, corresponding to internal data and coefficients [8]. 

The quantization processes at each of the N filter multiplication outputs 
amount to adding N noise sources at the filter output. Therefore, the output 
MSE is augmented by Nq 1/12, assuming q 2 is the quantization step. 

The quantization with step q x of the multiplication result in the coeffi- 
cient updating section is not so easily analyzed. Recursion (4.28) is modified 
as follows, taking into account the hypotheses on the roundoff noise sources 
and their independence of the other variables: 

E\[a(n + 1 )][«(« + 1)]'} = [I N - 28 diag(k ; )]£{H»)]K«)]'} 

a 2 (4.47) 

+ 8 2 E[e 2 (n + 1 )]diag(/. ; ) + EL /, v 
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H.6 +j 




(b) (c) 

FIG. 4.4 The second-order adaptive FIR prediction filter: (a) output error 
sequence; (b) coefficient versus time; (c) zeros in the complex plane. 
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FIG. 4.5 Adaptive FIR filter with word-length limitations. 



An additional gradient noise is introduced. 

When n —¥■ oo, equation (4.29) yields, as before, 




q[N 
122 8 



(4.48) 



Hence, the total residual error, taking into account the quantization of the 
filter coefficients with step q x and the quantization of internal data with step 
< 72 , as shown in Figure 4.5, is 



Ert 



1 

1 - {8/2 )Nol 




28 12 



■ N 



(4.49) 



or, assuming a small excess output MSE, 

E„^E mm {, + S -N^ + ^_ + N‘A (4.50) 



This expression shows that the effects of the two kinds of quantizations are 
different. Because of the factor i, the coefficient quantization and the corre- 
sponding word length can be very sensitive. In fact, there is an optimum <5 opl 
for the adaptation step size which minimizes the total residual error; accord- 
ing to (4.50) it is obtained through derivation as 



2 f^min ; A<t( 



Zl-L = o 

2 U8l pt 



(4.51) 
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and 



o 1 1 gi 

° pt VE—*,V 3 2 



(4.52) 



The curve of the residual error versus the adaptation step size is shown in 
Figure 4.6. For 5 decreasing from the stability limit, the minimum is reached 
for <5 opt ; if S is decreased further, the curve indicates that the total error 
should grow, which indeed has no physical meaning. The hypotheses 
which led to (4.50) are no longer valid, and a different phenomenon occurs, 
namely blocking. 

According to the coefficient evolution equation (4.3), the coefficient /i,(n) 
is frozen if 



I Sx(n - i)e(ri)\ < y (4.53) 

Let us assume that the elements of the vector 8X(ri)e(ri) are uncorrelated 
with each other and distribute uniformly in the interval [q t /2, q\/2\. Then 

8 2 E{e 2 {n)X(n)X\n)} = %I N (4.54) 
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If the coefficients are close to their optimal values and if the input signal can 
be approximated by a white noise, then equations (4.54) and (4.51) are 
equivalent. A blocking radius p can then be defined for the coefficients by 

p~ = E{[H(n) - H ovl ]\H{n) - H) opt ]} (4.55) 

Now, considering that 

H(n) - // opt - R - 1 E[e(n)X(n)] (4.56) 

we have, from (4.54) and the identity X'X — trace(AA'), 



P 



2 



1 

12 







-2 



(4.57) 



The blocking radius is a function of the spread of the input AC matrix 
eigenvalues. Blocking can occur for adaptation step sizes well over <5 opt , 
given by (4.52), if there are small eigenvalues. 

In adaptive filter implementations, the adaptation step size is often 
imposed by system specifications (e.g., the time constant), and the coefficient 
quantization step size cp is chosen small enough to avoid the blocking zone 
with some margin. 

Quantization steps q\ and q 2 are generally derived from expression (4.50). 
Considering the crucial advantage of digital processing, which is that opera- 
tions can be carried out with arbitrary accuracy, the major contribution in 
the total residual error should be the theoretical minimal error E min . In a 
balanced realization, the degradations from different origins should be simi- 
lar. Hence, a reasonable design choice is 



1 

2 




NScrl 

2 



Nq[ = d 

2<5 12 12 



(4.58) 



If b c is the number of bits of the coefficients and /; max is the largest coefficient 
magnitude, then, assuming fixed-point binary representation, we have 

<7i = h max 2 l ~ b ‘ (4.59) 

Under these conditions 



2lb c _ 



2 hi 



3 8 2 E, r 



(4.60) 



with the assumption that i: min is the dominant term in (4.50), that is, 

/'~’2 T? 2 

min ~ 



By introducing the time constant specification x e , one has approximately 
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O', 



b c ^ log 2 (r e ) + log 2 (G 5 ) + log 2 h max — 






(4.61) 



This expression gives an estimation of the coefficient word length necessary 
to meet the specifications of a gradient adaptive filter. However there is one 
variable which is not readily available, /i max ; a simple bound can be derived, 
if we assume a large system gain and refer to the eigenfilters of Section 3.7: 

u v 2 = E[y 2 (n)\ « H'(n)RH(n) Ss X mm H‘(n)H(n) (4.62) 

Now 



and 




(4.63) 



Therefore, the last term on the right side of (4.61) is bounded by zero for 
input signals whose spectrum is approximately flat, but it can take positive 
values for narrowband signals. 

Estimate (4.61) can produce large values for b c ; that word length is 
necessary in the coefficient updating accumulator but not in the filter multi- 
plications. 

In practice, additional quantizers can be introduced just before the multi- 
plications by hj(ri) in Figure 4.5 in order to avoid multiplications with high 
precision factors. The effects of the additional roundoff noise sources intro- 
duced that way can be investigated as above. 

Often, nonstationary signals are handled, and estimate (4.61) is for sta- 
tionary signals. In this case, a first approach is to incorporate the signal 
dynamic range in the last term of (4.61). 

To complete the filter design, the number of bits b t of the internal data 
can be determined by setting 

q 2 — max{|x(«)|, \y{ri)\}2 l ~ bi (4.64) 

with the assumption that a 2 x ^ a 2 , which is true in linear prediction and 
often valid in system modeling, and taking the value 4 as the peak factor of 
the signal x(n) as in the Gaussian case. Thus 

q 2 = 4(T x 2 l ~ b ‘ 



Now, (4.58) yields 



2 lb ‘ = 2 4 - 



l4 1 



3 E min 8 



> 
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By introducing the specifications we obtain 

* 2 + log 2 + log 2 (G s ) + \\og 2 {r e ) (4.65) 

This completes the implementation parameter estimation for the stan- 
dard gradient algorithm. However, some modifications can be made to 
this algorithm, which are either useful or even mandatory. 

4.6. LEAKAGE FACTOR 

When the input signal vanishes, the driving term in recursion (4.3) becomes 
zero and the coefficients are locked up. In such conditions, it might be 
preferable to have them return to zero. This is achieved by the introduction 
of a leakage factor y in the updating equation: 

H(n + 1) = (1 — y)H(n) + 8X(n + 1 )e(n + 1) (4.66) 

The coefficient recursion is 

H{n + 1) = [(1 - k)/,v - SX(n + 1 )X'(n + 1 )]//(«) + Sy(n + 1 )X(n + 1) 

(4.67) 

After convergence, 

H x = E[H(o o)] =[R + U N \\ yx (4.68) 

The leakage factor y introduces a bias on the filter coefficients, which can be 
expressed in terms of the optimal values as 

= + (4-69) 

The same effect is obtained when a white noise is added to the input 
signal x(n); a constant equal to the noise power is added to the elements 
of the main diagonal of the input AC matrix. 

To evaluate the impact of the leakage factor, we rewrite the coefficient 
vector Hoo as 

H x = M r diag( - MH opt (4.70) 

The significance of the bias depends on the relative values of k min and |. 

Another aspect is that the cost function actually minimized in the whole 
process is 

J Y (n) - E\[y(n ) - X'(n)H(n - l)] 2 + ^H\n - 1 )H{n - 1) j (4.71) 
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The last term represents a constraint which is imposed on the coefficient 
magnitudes [9]. 

The LS solution is given by (4.68), and the coefficient bias is 



H - // opt = 






R - Ik 



H, 



opt 



Hence the filter output MSE becomes 
Er = Emm + [H- H opt ]'R[H - H opi ] 



(4.72) 



(4.73) 



The leakage factor is particularly useful for handling nonstationary signals. 
With such signals, the leakage value can be chosen to reduce the output 
error power. 

If the coefficients are computed by minimizing the above cost function 
taken on a limited set of data, the coefficient variance can be estimated by 

E{[H - H 0 ][H - Hq] 1 } = E R [R + £/*]■'*[* + (4.74) 



and the coefficient MSE H MSE is 

Z/mse = [H- H opi ]'[H - H opt ] + trac e(E{[H - H 0 ][H - //„]'}) (4.75) 

When y increases from zero, // MSE decreases from E R ln\cc(R '), then 
reaches a minimum and increases, because in (4.75) the variance decreases 
faster than the bias increases at the beginning, as can be seen directly for 
dimension N — 1 [9]. A minimal output MSE corresponds to the minimum 
°f Hmse- 

A similar behavior can be observed when the gradient algorithm is 
applied to nonstationary signals. An illustration is provided by applying a 
speech signal to an order 8 linear predictor. The prediction gain measured is 
shown in Figure 4.7 versus the leakage factor for several adaptation step 
sizes S. The maximum of the prediction gain is clearly visible. It is also a 
justification for the values sometimes retained for speech prediction, which 
are S — 2~ 6 and y — 2~ 8 . 

The leakage factor, which can nicely complement the conventional gra- 
dient algorithm, is recommended for the sign algorithm because it bounds 
the coefficients and thus prevents divergence. 



4.7. THE LMAV AND SIGN ALGORITHMS 

Instead of the LS, the least absolute value (LAV) criterion can be used to 
compare variables, vectors, or functions. It has two specific advantages: it 
does not necessarily lead to minimum phase solutions; it is robust to outliers 
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FIG. 4.7 Prediction gain vs. leakage factor for a speech sentence. 






in a data set. Similarly, the least mean absolute value (LMAV) can replace 
the LMS in adaptive filters [10]. 

The gradient of the function \e(n + 1)| is the vector whose elements are 

= k ' *" +,) - X ’ f “ + ,4.76) 

= —x(n + 1 — /)sign e(n + 1) 



where sign e is +1 if e is positive and —1 otherwise. The LMAV algorithm 
for the transversal adaptive filter is 

H(n + 1) = H(n ) + SX(n + l)sign e(n + 1) (4.77) 



where A, a positive constant, is the adaptation step. 

The convergence can be studied by considering the evolution of the coef- 
ficient vector toward the optimum H opl . Equation (4.77) can be rewritten as 

H(n + 1) - H opt — H(n) - H opt + A X(n + l)sign e{n + 1) 

Taking the norm squared of both sides yields 

[H(n + 1) - H opi ]’[H(n + 1 ) - H opl ] = [H(n) - H opt ] l [H(n) - H opt ] 

+ 2 A sign e(n + 1 )X\n + 1 )[H(n) - H opl ] + A 2 X\n + \ )X(n + 1) 

(4.78) 



or, with further decomposition, 

II H(n + 1) - /7 opt || 2 —\\H(n) - 7/ opt || 2 + A 2 ||A(« + 1)|| 2 - 2A|e(n + 1)| 
+ 2 A sign e{n + 1 )[y(n + 1) - X\n + l)77 opl ] 

Hence we have the inequality 
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\\H(n+l)-H opt \\ 2 ^ ||//(«)-// opt || 2 + A 2 ||X(«+l)|| 2 -2Ak(»+l)l 
+ 2/±\y{n+\)-X , {n+\)H opl \ 



Taking the expectation of both sides gives 
E{\\H(n + 1) - 7/ opl || 2 } ^ || H{n) - // opt || 2 

+ A~N<7~ — 2AE{\e(n + 1) | } + 2A£' m | n 

(4.79) 



where the minimal error fi min is 

E mm = E [ | y(n + 1) - X\n + l)H opl \\ (4.80) 

If the system starts with zero coefficients, then 
E{\\H(n + 1) — 7/ opt l| 2 } ^ ll^optll 2 

n + 1 

+ (n + 1)( A 2 Nal + 2Afi min ) - 2A £ E{\e(p)\} 

P =\ 



Since the left side is nonnegative, the accumulated error is bounded by 



1 

n + 1 



E 



n + 1 

X! i e <»i 



P =\ 



^Na~ + E m in 




2A(«+ 1) 



(4.81) 



This is the basic equation of LMAV adaptive filters. It has the following 
implications: 



Convergence is obtained for any positive step size A. 
After convergence the residual error E R is bounded by 

Er < E min + — N cr~ 



(4.82) 



It is difficult to define a time constant as in Section 4. 1 . However, an adap- 
tation time r A can be defined as the number of iterations needed for the last 
term in (4.81) to become smaller than E mm . Then we have 



Ct 



1 II H, 



opt II 



A 2 E^ 



(4.83) 



The performance of the LMAV adaptive filters can be assessed from the 
above expressions. A comparison with the results given in Sections 4.3 and 
4.4 for the standard LMS algorithm clearly shows the price paid for the 
simplification in the coefficient updating circuitry. The main observation is 
that, if a small excess output MSE is required, the adaptation time can 
become very large. 
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Another way of simplifying gradient adaptive filters is to use the follow- 
ing coefficient updating technique: 

H(n + 1) = H(n) + A e(n + l)sign X(n + 1) (4.84) 



This algorithm can be viewed as belonging to the LMS family, but with a 
normalized step size. Since 



S1 g n x — — 



(4.85) 



and |x| can be coarsely approximated by the efficient value a x , equation 
(4.84) corresponds to a gradient filter with adaptation step size 




(4.86) 



The performance can be assessed by replacing 8 in the relevant equations. 
Pursuing further in that direction, we obtain the sign algorithm 

H(n + 1) = H(n ) + A sign e(n + l)sign X(n + 1) (4.87) 



The detailed analysis is rather complicated. However, a coarse but generally 
sufficient approach consists of assuming a standard gradient algorithm with 
step size 

5 = — (4.88) 

(J x (j e 



where a x and a e are the efficient values of the input signal and output error, 
respectively. 

In the learning phase, starting with zero-valued coefficients, it can be 
assumed that a e ~ er,. and the initial time constant r s of the sign algorithm 
can be roughly estimated by 






J_ a _y_ 

A a x 



(4.89) 



After convergence it is reasonable to assume a 2 e — E m in . If the adaptation 
step is small, the residual error E RS in the sign algorithm can be estimated by 



Ers 




N A a x \ 
9 /F . I 



(4.90) 



A condition for the above estimation to be valid is obtained by combin- 
ing (4.7) and (4.88), which yields 



A « 



2 

N 
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If the step size is not small enough, the convergence will stop when the error 
becomes so small that the stability limit is reached, approximately 



In that situation, the residual error can be estimated by 




A 

2 



Na x 



(4.91) 



which can be compared with (4.82) when E m \ n is neglected. 

It is worth pointing out that, for stability reasons, a leakage term is 
generally introduced in the sign algorithm coefficient, giving 

H(n + 1) = (1 — y)H(n) + A sign e{n + l)sign X(n + 1) (4.92) 

Under these conditions, the coefficients are bounded by 

I /;,(«) I < A , (4.93) 

Y 

Overall, it can be stated that the sign algorithm is slower than the stan- 
dard gradient algorithm and leads to larger excess output MSE [11-12]. 
However, it is very simple; moreover it is robust because of the built-in 
normalization of its adaptation step, and it can handle nonstationary sig- 
nals. It is one of the most widely used adaptive filter algorithms. 



4.8. NORMALIZED ALGORITHMS FOR 
NONSTATIONARY SIGNALS 

When handling nonstationary signals, adaptive filters are expected to trace 
as closely as possible the evolution of the signal parameters. However, due 
to the time constant there is a delay which leads to a tracking error. 
Therefore the excess output MSE has two components: the gradient mis- 
adjustment error, and the tracking error. 

The efficiency of adaptive filters depends on the signal characteristics. 
Clearly, the most favorable situation is that of slow variations, as mentioned 
in Section 2.13. The detailed analysis of adaptive filter performance is based 
on nonstationary signal modeling techniques. Nonstationarity can affect the 
reference signal as well as the filter input signal. In this section a highly 
simplified example is considered to illustrate the filter behavior. 

When only the reference signal is assumed to be nonstationary, the devel- 
opments of the previous sections can, with adequate modifications, be kept. 
The nonstationarity of the reference is reflected in the coefficient updating 
equation (4.3) by the fact that the optimal vector is time dependent: 
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(4.94) 



H(n + 1) - H opt (n + 1) = H(n) - H opt (n) + 8e(n + 1 )X(n + 1) 

If it can be assumed that the optimal coefficients are generated by a first- 
order model whose inputs are zero mean i.i.d. random variables e nS i (n), 
with variance er„ s , as in Section 2.13, then 

H opt (n + 1) = (1 — y\H ov i(n) + [c„s,o(» +!)>•••, e nS,(N-\)( n + l)] r (4.95) 



Furthermore, if the variations are slow, which implies y ~ 1 . the net effect of 
the nonstationarity is the introduction of the extra term rr nS / N in recursion 
(4.28). As already seen for the coefficient roundoff, the residual error E RTnS 
is 



ErThS 




— f 



N 2 
+ 25 CT " s 



or, for small adaptation step size, 

/ 8 ,\ N , 

ErtuS ^ E mm I 1 + p Ncri j + ^cr„ s 



(4.96) 



(4.97) 



In this simplified expression for the residual output error power with a 
nonstationary reference signal, the contributions of the gradient misadjust- 
ment and the tracking error are well characterized. Clearly, there is an 
optimum for the adaptation step size, <5 opt , which is 

*opt = - i 7= (4.98) 

-^min 

which corresponds to balanced contributions. 

The above model is indeed sketchy, but it provides hints for the filter 
behavior in more complicated circumstances [13]. For example, an order 12 
FIR adaptive predictor is applied to three different speech signals: (a) a male 
voice, (b) a female voice, and (c) unconnected words. The prediction gain is 
shown in Figure 4.8(a) for various adaptation step sizes. The existence of an 
optimal step size is clearly visible in each case. 

The performance of adaptive filters can be significantly improved if the 
most crucial signal parameters can be estimated in real time. For the gra- 
dient algorithms the most important parameter is the input signal power, 
which determines the step size. If the signal power can be estimated, then the 
normalized LMS algorithm 

H(n + 1) = H(n) + ^X(n + 1 )e(n + 1) (4.99) 

<4 

can be implemented. The most straightforward estimation er^ is P x \(ri) given 
by 
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FIG. 4.8 Prediction gain vs. adaptation step size for three speech signals: (a) LMS 
with fixed step; (b) normalized LMS; (c) sign algorithm. 
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(4.100) 




where P Q is a positive constant which prevents division by zero. The para- 
meter N 0 , the observation time window, is the duration over which the 
signal can be assumed to be stationary. 

For the prediction filter example mentioned above, the results corre- 
sponding to P 0 = 0.5 and N 0 — 100 (the long-term speech power is unity) 
are given in Figure 4.8(b). The improvements brought by normalization are 
clearly visible for all three sentences. The results obtained with the sign 
algorithm (4.87) are shown in Figure 4.8(c) for comparison purposes. The 
prediction gain is reduced, particularly for sentences b and c, but the robust- 
ness is worth pointing out: there is no steep divergence for too large 8, but a 
gradual performance degradation instead. 

In practice, equation (4.100) is costly to implement, and the recursive 
estimate of Section 3.3 is preferred: 



Estimates (4.100) and (4.101) are additive. For faster reaction to rapid 
changes, exponential estimations can be worked out. An efficient and simple 
method to implement corresponds to a variable adaptation step size A(n) 
given by 



where /(/?) is an integer variable, itself updated through an additive process 
(e.g., a sign algorithm [14]). 

The step responses of P x \{ri), P x2 ( w ) and the exponential estimate are 
sketched in Figure 4.9. Better performance can be expected with the expo- 
nential technique for rapidly changing signals. 

Adaptation step size normalization can also be achieved indirectly by 
reusing the data at each iteration. 

The a posteriori error e(n + 1) in equation (4.4) is calculated with the 
updated coefficients. It can itself be used to update the coefficients a second 
time, leading to a new error £[(« + 1). After K such iterations, the a poster- 
iori error e K (n + 1 ) is 



For <5 sufficiently small and K large, s K (n + 1)^0, which would have been 
obtained with a step size A satisfying 



P x2 (n +!) = (!- y)P x2 (n) + Yx 2 (n + 1) 



(4.101) 




(4.102) 



e K (n + 1) = [1 - 8X\n + 1 )X(n + 1 )] K+l e(n + 1 ) 



(4.103) 
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1 - A X\n + 1 )X(n + 1) = 0 
that is 



A 



1 

X'(n+ 1 )X(n+ 1) 



(4.104) 



The equivalent step size corresponds to the fastest convergence defined in 
Section 4.4 by equation (4.42). So, the data reusing method can lead to fast 
convergence, while preserving the stability, in the presence of nonstationary 
signals. 

The performance of normalized LMS algorithms can be studied as in the 
above sections, with the additional complication brought by the variable 
step size. For example, considering the so-called projection LMS algorithm 

mn + 1) = Hin) + xt{n+{ \ x{n+[) nn + 1M» + D (4-105) 



one can show that a bias is introduced on the coefficients, which becomes 
independent of the step size for small values, while the variance remains 
proportional to 5 [15]. 

A coarse approach to performance evaluation consists of keeping the 
results obtained for fixed step algorithms and considering the extreme para- 
meter values. 
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4.9. DELAYED LMS ALGORITHMS 



In the implementation, it can be advantageous to update the coefficients 
with some delay, say d sampling periods. For example, with integrated 
signal processors a delay d — 1 can ease programming. In these conditions 
it is interesting to investigate the effects of the updating delay on the adap- 
tive filter performance [16]. 

The delayed LMS algorithm corresponds to the equation 

H(n + 1) = H{n) + 8X(n + 1 — d)e(n + 1 — d) (4. 106) 

The developments of Section 4.3 can be carried out again based on the 
above equation. For the sake of brevity and conciseness, a simplified ana- 
lysis is performed here, starting from equation (4.24), rewritten as 

[a(n + 1)] = [«(«)] — SMX(n + 1 — d)e(n + 1 — d) (4.107) 

Substituting (4.26) in this equation and taking the expectation yields, under 
the hypotheses of Section 4.3, 

E{[a(n + 1)]} = E{[a(n)\} — 8 diag(A. i )£'{[a(n — d)]} (4.108) 

The system is stable if the roots of the characteristic equation 

r d+l - r d + 8^ = 0 (4.109) 

are inside the unit circle in the complex plane. Clearly, for d — 0, the con- 
dition is 

0 <5<— (4.110) 

^■max 

which is a stability condition sometimes used for the conventional LMS 
algorithms, less stringent than (4.7). 

When d — 1, the stability condition is 

0<5<— (4.111) 

■Lnax 

which implies that delay makes the stability condition more stringent. If 8 is 
small enough (<5 < |k max ), the roots of the second-order characteristic equa- 
tion are real: 

r x « 1 - 8X , ■( 1 + <U,), r 2 «= <5a,(1 + SX,) (4. 1 12) 

The corresponding digital filter can be viewed as a cascade of two first- 
order sections, whose time constants can be calculated; its step response is 
approximately proportional to 1 — (1 + <5k,)r", where the factor 1+<U ; 
reflects the effect of the root r 2 . However, neglecting the root r 2 , we can 
state that, for small adaptation step sizes, the adaptation speed of the 
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delayed algorithm is similar to that of the conventional gradient algorithm. 
In the context of this simplified analysis, the time constant r, for each mode 
is roughly 

(4-H3) 

Now, for d ^ 2, the characteristic equation (4.109) has a root on the unit 
circle if 

e Ad+l)oi _ e jda, + SXj = 0 (4.1 14) 

The imaginary part of the equation is 

sin(r/ + l)co — smdco — 0 (4.115) 

whose solutions are 



to — 0; (2d + 1)&> = (2k + \)n (—d < k ^ d) 

As concerns the real part, it provides the equality 

Sl, = 2(— (4.H6) 

At this stage, the root locus technique can be employed. If SXj is increased 
from zero, the first value which corresponds to a root of the equation is 
obtained for k — 0 and k— — 1 , and 

to — n/2(2d + 1) 



The stability is guaranteed if SXj remains smaller than the limit above. Hence 
the stability condition 



„ 2 . JT 

0 < S < sin — — — — 

^•max 2(2 d + 1 ) 

For large d.. the condition simplihes to 

1 JT 



0 < S < 



^•max + 1 



(4.117) 



(4.118) 



Turning to the excess output MSE, a first estimation can be obtained by 
considering only the largest root of the characteristic equation and assuming 
that the delayed LMS is equivalent to the conventional LMS with a slightly 
larger adaptation step. For d — 1, referring to equation (4.112), we can take 
the multiplying factor to be 1 + <U max . The most adverse situation for 
delayed LMS algorithms is the presence of nonstationary signals, because 
the tracking error can grow substantially. 
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4.10. THE MOMENTUM ALGORITHM 



The momentum algorithm is an alternative approach to improve on the 
performance of the gradient algorithm, while sacrificing little in computa- 
tional complexity. 

The starting point is the recursive equation for the output error energy in 
the least squares approach. In Chapter 6, it will be shown that the following 
equation holds: 

E(n + 1) = WE(n) + e(n + 1 )e(« + 1 ) (4. 1 19) 

where W is the weighting factor (0 < W < 1). Assuming that the coefficient 
vector is updated proportionally to the gradient of the error energy E(n + 1 ) 
and approximating the “a posteriori” error e(n + 1) by the “a priori” error 
e(n + 1), the momentum algorithm is obtained: 

e(n + 1) = y(n + 1 ) + H'(n)X(n +1) 

H(n + 1) = H(n ) + a{H(n ) — H(n — 1)] + 8e(n + 1 )X(n + 1) 



The scalar a is called the momentum factor, by analogy with the use of the 
term in mechanics. An obvious condition for stability is |or| < 1. In fact, the 
stability of the momentum algorithm can be investigated in a way similar to 
that of the gradient algorithm. The evolution of the coefficients is governed 
by the equation 

H(n + 1) = [/, v (l + a) - SX(n + \)X\n + 1 )\H(n) ^ 

+ 8y(n + 1 )X(n + 1) — aH(n — 1) 

Replacing X{n + 1 )X\n + 1) by I n No to take a conservative approach, the 
second-order characteristic equation of the system has its roots inside the 
unit circle if 



1 1 + a — 8Na\ \ < 1 + a, a < 1 



(4.122) 



which leads to the stability conditions 



0 < 8 < 



2(1+ a) 

Ncr 2 K 



, a < 1 



(4.123) 



The performance of the algorithm can be evaluated by following a pro- 
cedure similar to that of the standard gradient algorithm, but with increased 
complexity. However, considering that the momentum term introduces a 
first-order difference equation with factor a, a coarse assessment of the 
algorithm’s behavior is obtained by replacing 8 by <5/(1 — a) in the expres- 
sions obtained for the gradient algorithm. For example, this accounts for the 
gain in convergence time observed in simulations [17]. 
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4.11. VARIABLE STEP SIZE ADAPTIVE FILTERING 



The performance of gradient adaptive filters is a compromise between speed 
of convergence and accuracy. A large step size makes the adaptation fast, 
while a small value can make the residual error close to the minimum. 
Therefore, a variable step size can offer a potential for improvement, and 
a possible approach is to apply the gradient algorithm to the step size itself 
[18]. 

Assuming a time-varying step size, the filter output error can be expressed 
by 



e(n + 1) = y(n + 1) - [H(n - 1 ) + 8(n)e{n)X{nfl X(n + 1) (4. 124) 

The step size S(n ) can be updated with the help of the derivative of e 2 (n + 1) 
with respect to S. At time (n + 1), the following operations have to be carried 
out: 



e(n + 1) = y(n + 1 ) — H r (n)X(n + 1) 

8(n + 1) = 8{n) + pe{n + 1 )e{n)X l (n)X(n + 1 ) (4. 125) 

H(n + 1) = H(n) + 8{n + 1 )e{n + 1 )X(n + 1) 

The above equations define a variable-step-size gradient algorithm, and the 
parameter p is a real positive scalar that controls the step size variations. To 
figure out the evolution of the step size, its updating equation can be rewrit- 
ten as 

8 (n + 1) = [1 - pe\n)[X‘{n)X(n + 1 )] 2 ]%) 

+ p[v(« + 1) — H'(n — 1 )X(n + 1 )]e(n)X t (n)X(n + 1) 

Clearly, the step size 8(n) decreases as the filter converges, and its mean value 
stabilizes at a limit which is determined by the correlation of the input signal 
and the correlation of the residual error. 



4.12. CONSTRAINED LMS ALGORITHMS 

The adaptive filters considered so far use a reference signal to compute the 
output error, which serves to update the coefficients. It might happen that 
this reference signal is zero, as in linear prediction. In such a situation, at 
least one constraint must be imposed on the coefficients, to prevent the 
trivial solution of all the coefficients being null. In linear prediction, it is 
the first coefficient which is a one. Another example has been given in 
Section 3.10 with the iterative calculation of the coefficients of an eigenfilter. 
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The case of a set of K independent linear constraints can be dealt with by 
forming a reference signal from the input signal and the constraints, as 
shown in Figure 4.10. The system is defined by the equations 



e(n + 1) = H l (n)X(n + 1 ) 
C'H(n) = F 



(4.127) 



The matrix C is formed by the K constraint vectors, F being a k'-elcment 
vector which is part of the constraint system. Now, a reference signal y q {n) 
can be formed from the input signal with the help of the coefficient vector 
WQ defined by 

WQ = C[C'C]~ l F (4.128) 

The matrix WS is orthogonal to the constraint vector and it has the rank 
N — K. The adaptive filter Ha(z) has N — K coefficients, which are updated 
according to the LMS algorithm [19]. 

The constraints may also come as an addition to an adaptive filter with a 
reference signal. Then the coefficients must be updated in a space which is 
orthogonal to the constraint space. The algorithm is as follows 

e(n + 1) = y(n + 1 ) - H'{n)X{n + 1) 

H(n + 1) = P[H(n) + Se(n + \)X(n + 1 )] + m 



with 

p = i n - cfc'q-'c' m = c[c'q _1 T 

The derivation of the equations (4.128) and (4.129) is obtained through the 
Lagrange multiplier technique, which is detailed in Chapter 7, in the context 
of least squares adaptive filtering. 



X(n) 




FIG. 4.10 Constrained adaptive filter. 
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4.13. THE BLOCK LMS ALGORITHM 



In some applications, it can be convenient to perform the coefficient adap- 
tation less often than each sampling period. In block adaptive filtering, the 
data sequences are arranged into blocks of length L and adaptation is 
carried out only once per block. 

Let X NL (m) denote the N x L-element input signal matrix associated with 
block m and [ y(m)\ and \e(m)\ represent the /.-element vectors of reference 
signal and output error respectively. Then, the block LMS algorithm is 
defined by the set of equations 

e(m + 1)] = \y{m + 1)] - X‘ NL (m + Y)H(m) 

1 (4.130) 

H{m + 1) = H(m) + S — X NL (m + 1 )[e(m + 1)] 



The evolution of the //-element coefficient vector H(m) is determined by 
substituting the error equation into the updating equation, to yield 



H(n + 1) — [/,v — S — X NL (m + 1 )X‘ NL (m + Y)]H(m) 



(4.131) 



+ ^ X m (m + \)[y(m- 



D] 



The important point here is that the data are averaged. For L sufficiently 
large, the following approximation is valid: 

X NL( m + I + 1) ^ LR XX (4.132) 



Thus the stability condition for the step size S is 

0 < 5 < (4.133) 

^max 

If the input signal is close to a white noise, the adaptation time constant, 
expressed in terms of the data period, is 

T = L sk (4 ' 134) 

where cr\ is the input signal power, as usual. As concerns the residual error 
power, it is not necessary to go through all the equations to assess the 
impact of the block processing. The averaging operation carried out on 
the driving term in the equation which gives the evolution of the coefficients 
(4.131) produces a reduction of the error variance by the averaging factor L. 
Thus, the residual error power can be expressed by 
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1 



(4.135) 



Ed — E„ 



1 ~l~L Na * 



Compared to the standard LMS algorithm, it appears that the block algo- 
rithm is slower but has a smoother operation. Also, it cannot track changes 
in the data sequence which are limited to a single block. 

It must be pointed out that some advantages in implementation can be 
gained from the block processing of the data. 



4.14. FIR FILTERS IN CASCADE FORM 

In certain applications it is important to track the roots of the adaptive filter 
z-transfer function — for instance, for stability control if the inverse system is 
to be realized. It is then convenient to design the filter as a cascade of L 
second-order sections H,(z), 1 ^ ^ L, such that 

H/( z ) — 1 + *i/ z 1 + h 2 /z 

For real coefficients, if the roots z t are complex, then 

h\i — 2Re(r/), h 2 ,= \z,\ 2 (4.136) 

The roots are inside the unit circle if 

1*2/1 <1. I*i/I <l+*2/. (4.137) 

The filter transfer function is 

L 

H{z) — ]~~[(1 + h u z 1 + h 2 fZ 2 ) 

i=i 

The error gradient vector is no longer the input data vector, and it must be 
calculated. 

The filter output sequence can be obtained from the inverse r-transform 

y(n) = ^ .J r z n ~ 1 fid + M -1 + h 2l z- 2 )X(z) dz (4.138) 

where r is a suitable integration contour. Hence 
de(n H- 1) dy(n+\) 

dhki dhki 

L 

z~ k f](l + h u z~ l + h 2l z~ 2 )X(z) dz 

1 = 1 
¥‘ 
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or, more concisely, 



de{n + 1) 

dh ki 



J_ f _n : -k H{z) 

2tt/ J y 1 + h\jZ * + hi/Z ^ 



X(z) dz 



(4.139) 



Therefore, to form the gradient term g k j(n) — de(ri)/dh ki , it is sufficient to 
apply the filter output y(n) to a purely recursive second-order section, whose 
transfer function is just the reciprocal of the section with index i. The 
recursive section has the same coefficients, but with the opposite sign. The 
corresponding diagram is given in Figure 4.11. 

The coefficients are updated as follows: 



h ki (n + 1) = h ki (n) + Se(n + 1 )g ki (n +1), k — 1, 2, 1 < / < L (4.140) 

The filter obtained in this way is more complicated than the transversal FIR 
filter, but it offers a simple method of finding and tracking the roots, which, 
due to the presence of the recursive part, should be inside the unit circle in 
the z-plane to ensure stability [20]. 

Flowever, there are some implementation problems, because the indivi- 
dual sections have to be characterized for the filter to work properly. That 
can be achieved by imposing different initial conditions or by separating the 
zero trajectories in the z-plane. 




FIG. 4.11 Adaptive FIR filter in cascade form. 
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4 . 15 . MR GRADIENT ADAPTIVE FILTERS 



In general, HR filters achieve given minimum phase functions with fewer 
coefficients than their FIR counterparts. Moreover, in some applications, it 
is precisely an HR function that is looked for. Therefore, HR adaptive filters 
are an important class, particularly useful in modeling or identifying systems 
[ 21 ]- 

The output of an HR filter is 

L K 

y(n) = Y a ' x ( n - o + Y bk ^ n ~ ^ (4 - 141 ) 

1=0 k= 1 



The elements of the error gradient vector are calculated from the derivatives 
of the filter output: 



9j(») 

da i 



x(n — I) + Y^ 1 



k= 1 



9 y(n — k) 
da i 



0 ^ I ^ L 



and 



dy(n) 

db k 



K 

y(n -k) + Y b i 

i= 1 



dy(n — i) 

db k 



1 ^ k s; K 



(4.142) 



(4.143) 



To show the method of realization, let us consider the 2 - transfer function 



H(z) = 



1=0 



flb k z k 
k= 1 



N{z) 

D(z) 



The filter output can be written 



y(n) = J-[ 

J r 

Consequently 

9v(”) 1 f z n- 

dai 2nj J r 

dj(») 1 f z «- 

db k 2 k j J r 



H(z)X(z) dz 



D(z) 

l z~ k ^- H(z)X(z)dz 
D(z) 



(4.144) 



(4.145) 

(4.146) 



The gradient is thus calculated by applying x(n) and y(n) to the circuits 
corresponding to the transfer function ' .. 
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To simplify the implementation, the second terms in (4.142) and (4.143) 
can be dropped, which leads to the following set of equations for the adap- 
tive filter (in vector notation): 



e(n + 1) = y(n + 1) - [ A\n ), B'(n j\ 



X(n + 1 ) 
Y(n) 



(4.147) 



A(n + 1 ) 
B(n + 1 ) 



An) 

B(n) 




X(n + 1) 
Y(n) 



e(n + 1) 



(4.148) 



The approach is called the output error technique. The block diagram is 
shown in Figure 4.12(a). The filter is called a parallel IIR gradient adaptive 
filter. 

The analysis of the performance of such a filter is not simple, because of 
the vector Y(n) of the most recent filter output data in the system equations. 
To begin with, the stability can only be ensured if the error sequence e(n) is 





FIG. 4.12 Simplified gradient IIR adaptive filters: (a) Parallel type (output error); 
(b) series-parallel type (equation error). 
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filtered by a /-transfer function C(z), such that the function C(:)/D(z) be 
strictly positive real, which means 



Re 



' C(z) 

m. 



> o, 



M = 1 



(4.149) 



An obvious choice is C(z) — D(z). 

An alternative approach to get realizable HR filters is based on the 
observation that, after convergence, the error signal is generally small and 
the filter output y(n) is close to the reference y(ri). Thus, in the system 
equations, the filter output vector can be replaced by the reference vector: 



e(n + 1) = y(n + 1) - [A\n), B\n)\ 



X(n + 1 ) 
Y(n) 



(4.150) 



A(n + 1 ) 
B(n+ 1) 



A») 

B(n ) 




X(n + 1 ) 
Y(n) 



e(n + 1) 



(4.151) 



This is the equation error technique. The filter is said to be of the series- 
parallel type; its diagram is shown in Figure 4.12(b). Now, only FIR filter 
sections are used, and there is no fundamental stability problem anymore. 
The performance analysis can be carried out as in the above sections. The 
stability bound for the adaptation step size is 



0 < 8 < 



2 

L<j~ + Kcs\, 



(4.152) 



Overall the performance of the series-parallel HR gradient adaptive filter 
can be derived from that of the FIR filter by changing Na\ into Lay + Ka\. 

In order to compare the performance of the parallel type and series- 
parallel approaches, let us consider the expectation of the recursive coeffi- 
cient vector after convergence, B O0 , for the parallel case. Equations (4.147) 
and (4.148) yield 

B x = E[Y{n)Y\n)\- X E{Y{n)[y{n + 1) - A'(n)X(n + 1)]} (4.153) 

The parallel-series type yields a similar equation, but with E[Y(n) T'(n)] -1 ; if 
the output error is approximated by a white noise with power a^, then 

E[ Y(n ) Y l (n)} = a]l N + E[ Y{n) Y l (n)\ (4. 1 54) 

and a bias is introduced on the recursive coefficients. The above equation 
clearly illustrates the stability hazards associated with using Y(n), because 
the matrix can become singular. Therefore, the residual error is larger with 
the parallel-series approach, while the adaptation speed is not significantly 
modified, particularly for small step sizes, because the initial error sequences 
are about the same for both types. 
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Finally, several structures are available, and HR gradient adaptive filters 
can be an attractive alternative to FIR filters in relevant applications. 

4.16. NONLINEAR FILTERING 

The digital filters considered up to now have been linear filters, which means 
that the output is a linear function of the input data. We can have a non- 
linear scalar function of the input data vector: 



The Taylor series expansion of the function f(X) about the vector zero is 



with differential operator notation. When limited to second order, the 
expansion is 



where y 0 is a constant, H is the vector of the linear coefficients, and M is the 
square matrix of the quadratic coefficients, the filter length N being the 
number of elements of the data vector X(n). This nonlinear filter is called 
the second-order Volterra filter (SVF) [22], 

The quadratic coefficient matrix M is symmetric because the data matrix 
X(n)X l (n) is symmetric. Also, if the input and reference signals are assumed 
to have zero mean, y(n) must also have zero mean, which implies 



When this structure is used in an adaptive filter configuration, the coeffi- 
cients must be calculated to minimize the output MSE, E{(y(n) — y(n)) 2 }. 
For Gaussian signals, the optimum coefficients are 



It is worth pointing out that the linear operator of the optimum SVF, in 
these conditions, is exactly the optimum linear filter. Thus, the nonlinear 
filter can be constructed by adding a quadratic section in parallel to the 
linear filter, as shown in Figure 4.13. 



yin) =f[X(ri)\ 



(4.155) 




(4.156) 



y(n) — jo + H*X(n ) + trac e(MX(n)X'(n)) 



(4.157) 



E[ y(n)\ = y 0 + trac e(MR) 



(4.158) 



Therefore (4.157) can be rewritten as 

y(n) — H'X(n) + trac z{M[X(ri)X\ri) — R]) 



(4.159) 



H opt = R-'E[y(n)X(n )] 

M opt = \R- l E[y(n)X(n)X\n)lR^ 



(4.160) 
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FIG. 4.13 Second-order nonlinear filter for Gaussian signals. 
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The minimum output MSE is 

-Emin = £■[/(«)] - E[ y(n)X(n)]'R - 1 E[y(n)X («)] 
— i trace(7? _ 1 y(/?)A(«)A'(«)] R 1 -E[ 



The gradient techniques can be implemented by calculating the deriva- 
tives of the output error with respect to the coefficients. The gradient adap- 
tive SVF equations are 

e{n + 1) = y(n + 1) — H\n)X{n + 1) 

- trace(M(«)[X(« + 1 )X'(n + 1) - /?]) 

H(n + 1) = H(n) + 8/,X(n + 1 )e(n + 1 ) 

M(n + 1 ) = M (n) + 8 m X(n + X)X\n + \)e(n + 1) 



where 8 h and 8 m are the adaptation steps. 

The zeroth-order term trac e(M(n)R) is not constant in the adaptive 
implementation. It can be replaced by an estimate of the mean value of 
the quadratic section output, for example, using the recursive estimator of 
Section 3.3. 

The stability bounds for the adaptation steps can be obtained as in 
Section 4.2 by considering the a posteriori error s(n + 1): 

e(« + 1) = e(n + 1)[1 — 8 h X\n + 1 )X(n + 1) 

- 8 m trace(X(n + 1 )X‘(n + 1 )[X(n + 1 )X‘(n + 1) - /?])] 

(4.163) 

Assuming that the linear operator acts independently, we adopt condition 
(4.7) for 8 h . Now, the stability condition for 8 m is 

1 1 — 5,„(trace E[X{n + 1 )X‘(n + 1 )X(n + 1 )X\n + 1)] — trace R 2 ) \ < 1 

The following approximation can be made: 

trace E[X(n + 1 )X‘(n + 1 )X(n + 1 )X\n + 1)] « (Na^) 2 > trace R 2 

(4.164) 

Hence, we have the stability condition 

0 < 8 m < — (4.165) 

{Naif 

The total output error is the sum of the minimum error E nnn given by 
(4.140) and the excess MSEs of the linear and quadratic sections. Using 
developments as in Section 4.3, one can show the excess MSE of the quad- 
ratic section E M can be approximated by 
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(4.166) 



E m * yF ml „[(Mr 2 ) 2 + 2 trace R 2 ] 

In practice, the quadratic section in general serves as a complement to the 
linear section. Indeed the improvement must be worth the price paid in 
additional computational complexity [23]. 



4.17. STRENGTHS AND WEAKNESSES OF 
GRADIENT FILTERS 

The strong points of the gradient adaptive filters, illustrated throughout this 
chapter, are their ease of design, their simplicity of realization, their flex- 
ibility, and their robustness against signal characteristic evolution and com- 
putation errors. 

The stability conditions have been derived, the residual error has been 
estimated, and the learning curves have been studied. Simple expressions 
have been given for the stability bound, the residual error, and the time 
constant in terms of the adaptation step size. Word-length limitation effects 
have been investigated, and estimates have been derived for the coeffficient 
and internal data word lengths as a function of the specifications. Useful 
variations from the classical LMS algorithm have been discussed. In short, 
all the knowledge necessary for a smart and successful engineering applica- 
tion has been provided. 

Although gradient adaptive filters are attractive, their performance is 
severely limited in some applications. Their main weakness is their depen- 
dence on signal statistics, which can lead to low speed or large residual 
errors. They give their best results with flat spectrum signals, but if the 
signals have a fine structure they can be inefficient and unable, for example, 
to perform simple analysis tasks. For these cases LS adaptive filters offer an 
attractive solution. 



EXERCISES 

1. A sinusoidal signal x(n) — sin(«7r/2) is applied to a second-order linear 
predictor as in Figure 4.3. Calculate the theoretical ACF of the signal 
and the prediction coefficients. Verify that the zeros of the FIR pre- 
diction filter are on the unit circle at the right frequency. 

Using the LMS algorithm (4.3) with S — 0.1, show the evolution of 
the coefficients from time n = 0 to n — 10. How is that evolution mod- 
ified if the MLAV algorithm (4.77) and the sign algorithm (4.87) are 
used instead. 

2. A second-order adaptive FIR filter has the above x(n) as input and 
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